NOTES.END[LSP,JRA] - www.SailDart.org

perm filename NOTES.END[LSP,JRA] blob sn#287109 filedate 1977-06-13 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00045 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00005 00002	.SEC(The Dynamic Structure of LISP,Dynamic Structure,,P107:)
C00028 00003	.SS(Primitives for LISP,,P230:)
C00043 00004	.SS(%aSM%*: A Simple Machine,%aSM%*,P33:)
C00062 00005	.SS(Implementation of the Primitives,,P234:)
C00077 00006	.SS(Assemblers,Assemblers,P7:)
C00088 00007	.SS(Compilers for Subsets of LISP)
C00112 00008	.SS(Compilation of Conditional Expressions)
C00115 00009	.BEGIN NOFILLGROUP
C00118 00010	.GROUP
C00126 00011	.SS(One-pass Assemblers and Fixups)
C00138 00012	.<<FIXUPS>>
C00150 00013	.SS(Compiling and Interpreting,,P240:)
C00155 00014	
C00165 00015	.SS(A compiler for Simple %3eval%*: The Value Stack,compiler,P32:)
C00178 00016	.SS(A Compiler for Simple %3eval%*,compiler,P249:)
C00194 00017	Here is a partial sketch of %3compile%* operating on %3j <= λ[[xy]f[g[x]h[y]]]%*.
C00198 00018	.SS(Efficient Compilation)
C00203 00019	.SS(Efficiency: Primitive Operations,,P166:)
C00210 00020	.SS(Efficiency: Calling Sequences)
C00223 00021	.SS(Efficiency: Predicates)
C00229 00022	.SS(A Compiler for %3progs%*)
C00234 00023	.SS(Further Optimizations,,P258:)
C00245 00024	.SS(Functional Arguments,,P3:)
C00250 00025	.SS(Macros and Special Forms,macros,P57:)
C00261 00026	.SS(Compilation and  Variables,non-local variables,P5:)
C00277 00027	.SS(Interactive Programming,,P239:)
C00292 00028	.SS(LISP Editors,,P247:)
C00298 00029	.SS(Debugging in LISP,,P168:)
C00311 00030	.SEC(Storage Structures and Efficiency,,,P210:)
C00316 00031	.SS(Bit-tables,,P263:)
C00319 00032	.SS(Vectors and Arrays,,P137:)
C00328 00033	A typical implementation of an array facility in LISP would include
C00331 00034	.SS(Strings and Linear LISP,string processor,P27:)
C00342 00035	.SS(A Compacting Collector for LISP,,P291:)
C00355 00036	.SS(Representations of Complex Data Structures,,P138:)
C00364 00037	.SS(%3rplaca%* and %3rplacd%*,,P171:)
C00373 00038	.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
C00387 00039	.SS(Numbers,numbers)
C00397 00040	.SS(Stacks and Threading)
C00404 00041	.SS(A Non-recursive %3read%*)
C00415 00042	.SS(More Applications of Threading)
C00421 00043	.SS(Storage Management and LISP,,P224:)
C00437 00044	.SS(Hash Techniques,,P248:)
C00443 00045	.REQUIRE "NOTES.IMP" SOURCE_FILE
C00446 ENDMK
C⊗;
.SEC(The Dynamic Structure of LISP,Dynamic Structure,,P107:)
.TURN ON "←";

.SS(Introduction)
As things presently stand we have developed the  basic static structure of a LISP 
machine consisting
of %3eval%* and its subfunctions, the I/O routines, the garbage
collector, and an initial symbol table which contains the constants. These
constants include primitive 
functions, %3T%1 and %3NIL%1, and  usually a collection of utility functions
like %3append%1, and %3reverse%1. 
We have two areas of memory:#pointer space, and full word space.

Expressions  to be evaluated are read in, converted to list structure,
and evaluated by %3eval%*.  The evaluation entails  traversing the 
S-expression representation of the form, interpreting the 
information found there as LISP "instructions". 
We have discussed some basic data structures for
implementation of λ-bindings and evaluation,
but we have said little about how the actual execution of the expression
takes place. The essential ingredient involved here  is the handling
of control -- the dynamics of LISP execution.
For example,
how can we implement call-by-value, parameter passing, recursion, and 
conditional expressions?
At a high level, the original %3eval-apply%1 pair of {yonss(P6)}
%2does%1 describe the evaluation mechanism. Given the data structure representation
of an expression, we can mechanically perform the transformations
implied in the %3eval-apply%1 pair. Even more of the detail is made
explicit in the later evaluators of {yonss(P209)} through {yonss(P211)}.
However we must still implement these evaluators on a "real" machine
and, unless the evaluator is  built into the hardware, we must
express the evaluator in terms of more primitive operations. For example,
we cannot "implement" recursion by %2using%1 recursion; we must express
that idea in terms of lower level operations. Obviously this
decomposition must stop somewhere. As J.#McCarthy once said: "Nothing
can be explained to a stone"; we must assume certain primitives are known.

In this chapter we will discuss two layers of "primitive operations"
or %2instructions%1. One layer will
correspond to traditional hardware, and another layer will correspond
to the primitives which we derive from the evaluator of {yonss(P211)}.
Here we discuss the primitives of that section as the  basis for a machine
which executes LISP expressions.   We can describe
the evaluation of a LISP expression as  the execution of a 
sequence of these instructions. Both operations are equivalent: either
evaluate the expression or execute the instruction sequence.
There are common instances in which the execution of the instructions
can be considered to be "more efficient" than the evaluation of the expression.

For example, consider the access to a local variable. Each such access
is to the same  location relative to the local environment. That
relative location can be computed easily, but the
evaluator will use a version of %3lookup%1 for every access.
We %2must%1 resort to %3lookup%1 for non-local variables, since
LISP uses dynamic binding and 
the activation environment will typically effect which binding is accessible,
but since the location of any local variable is computable,  we should 
exploit that knowledge when executing our programs.

Several examples also arise in the evaluation of a %3prog%1. For example
a loop typically consists of a static⊗↓By static we mean that the
actual expressions do not change during execution. Using  the fact that
LISP programs are data structures and using
some marginal programming techniques %3rplaca%1 and %3rplacd%1 ({yonss(P171)})
we can in fact write self modifying programs. However, such practice is 
not common.← sequence of statements. Each time around the loop an
evaluator will execute the same sequence of instructions. It would be
faster to simply execute the sequence of instructions rather than 
re-interpret each expression. A related efficiency involves the execution of
%3go%1. We  assumed  in {yonss(P211)} that the evaluator will either lookup
the label by searching the body of the %3prog%1 or, perhaps more efficiently,
searching a computed %3go_list%1. Either case requires a search.
If we can replace the body of a loop with a sequence of primitive instructions,
then we can replace a %3go%1 with a transfer of control to the
beginning of an appropriate block of instructions. Such a transfer operation would
be one of our instructions. 
A problem  related to the execution of loops is the %2recognition%1 of
a loop.
The extent of --or even the presence of-- a loop
which the user is 
controlling by tests and %3go%1's may be  difficult to discover.  If a loop
is controlled by language constructs (%3while,#do,#repeat%1,#etc.) then the 
interpreter should have some chance of improving the execution of the loop.
This, perhaps, is another good reason for removing control of iteration
from the hands of the programmer.  


The translation of an expression into a sequence of instructions is not without
cost. If we wish to evaluate a simple expression only once, then
direct evaluation is usually less time-consuming than translation plus
execution.
However expressions subjected to repeated evaluation can profitably be
translated into   instructions and executed.

The translation part of the process which we have been describing is called
a %2compiler%1.
It
is a mapping from the LISP expressions to 
a sequence of  instructions. When the instructions are
carried out they will have the same effect as %3eval%*, interpreting
the original expression.
A compiler is  a useful tool for increasing the  speed of execution.
J. McCarthy says a compiler allows you to look %2before%1 you leap; we will
show that we can look %2as%1 we leap. That is, we can compile instructions
as we evaluate  expressions; if the expression is evaluated again
then we execute the faster  compiled version.

The relationship between compilation and interpretation should be coming
more apparent:  the interpreter performs the evaluation; the
compiler emits instructions which when executed produce the same computational
effect as the evaluator.  
Since the code produced by the compiler is either in  machine language
or in a form closer to the machine than the source program, we can execute
the code much faster. A speed-up factor of thirty to fifty is not uncommon.
Also in LISP we know that programs are stored as S-expressions.
Our compiled code will be machine language, thus freeing a large quantity
of S-expression storage. This will usually make garbage collection 
less time consuming.

Why not compile all programs? We already have seen that we can %3cons%*-up
new expressions to be evaluated while we are running. Still we can force
even those expressions through a compiler before execution⊗↓There are,
however, programs which simply %2cannot%* be compiled. The most obscene
examples involve self-modifying programs;
that is, programs which modify their representation in order to
affect the course of interpretation.  An example is reluctantly
given on {yon(P161)}.←. The answer,
rather, is that 
for debugging and editing of programs it is extremely convenient
to have a structured representation of the program 
in memory. 

This structured representation  also simplifies the discussion of compilation.
It is true that compilers can be written to 
go directly
from M-expression representation to internal machine code⊗↓The  compilers
which perform in this manner are called sytnax directed compilers. They
are an instance of a computational scheme called syntax directed computation;
the idea is based on the observation that many algorithms parallel the 
underlying data structure. We have seen this behavior frequently in our
data structure algorithms. For application to cmpiling and parsing
see ⊗↑[Gri#71]↑ or ⊗↑[Aho#72]↑.←.
Conventional compiler discussions include description of the syntax analysis
problems, for we  cannot  compile code until we know what the syntactic structure
of the program is. However, syntax analysis is really irrelevant
for a clear understanding of the implementation of a compiler.
Assuming the existence of the structured representation, the compiler
is conceptually very simple.
This structured representation is the S-expr representation in LISP and 
resembles a parse tree in other languages.
When
we wish to run the program at top speed, we can compile the programs.
The compiler can then translate the abstract representation of the program
into machine code.
We shall say more about this view of programming later.


We shall exploit the analogy between compilers and evaluators when we write the
LISP function, ⊗→%3compile%*↔←, which will implement the compiler.
We shall do this in two ways. First, the structure of the %3compile%* function
and its subfunctions will be written to capitalize on the knowledge we
have acquired from writing evaluators. Second,  we will attempt
to abstract from the specific compiler the essence of all LISP compilers without
losing all of the details.
This second task is more difficult because we have to separate two representations
from the specific compiler. We are representing forms as data structures, and we
are also dealing with the representation of a specific machine.
The task %2is%* worth pursuing since we wish to write different compilers
for the same machine and also  a single compiler capable of easy transportation to
other machines. 

The input to
%3compile%* is the representation of a LISP function; the output is a list
which represents a sequence of machine instructions.  Assume that we have
LISP running on %aBrand X%* machine, and we have  written  a
%3compile%*  which produces code for %aBrand X%* machine. Then perform
the following sequence of steps:
.BEGIN TABIT1(15);GROUP

\	1.  Create the S-expression form of %3compile%*.

\	2.  Introduce this translation into the machine, defining %3compile%*.

\	3.  Ask LISP to evaluate:  %3compile[COMPILE]%*.
.END

Since %3compile%* compiles code
for %aBrand X%* machine,  it translates the S-expression representation of its 
argument
into machine code. Therefore the output of step 3   is a list of instructions 
representing the translation of  %3compile%*.  That is, step 3  compiles
the compiler.

A technique called %2⊗→bootstrapping↔←%* is closely related to the process 
just described.
To illustrate this idea,
assume that we have LISP and its compiler running on %aBrand#X%*, and
we wish to implement LISP  on %aBrand Y%*. If we have been careful in
our encoding of the %3compile%* function then %2most%* of %3compile%* is
machine independent; that is, it deals mostly with the structure of the 
LISP expression and
only in a few places deals with the structure of a particular machine.  As we
will see, this is not an unrealisitc assumption.
Notice that this is one of our  early programming admonitions:  
encode algorithms in a 
representation-independent style and include representation-dependent
routines to interface with the program. Changing representations  simply
requires changing  those simpler subfunctions. Here the representations are
of machines and the algorithm is a compiling function for LISP.

Let us call those parts of the compiler
which deal with the machine, the ⊗→code generators↔←
or instruction generators. Now if we understand the
machine organization of brands %aX%* and %aY%* then for any instruction on %aBrand#X%*
we should be able to give a  sequence of  instructions having the equivalent
effect on %aBrand#Y%*. We can change the instruction
 generators in %3compile%*
to generate instructions which run on %aBrand#Y%*. We  would have a %3compile%* function,
call it %3compile*%*, running on %aX%* and producing instructions for %aY%*.
Take the S-expr representations of %3eval, apply, read, print, compile*,...%* etc.
and pass these  through %3compile*%*; it this way we generate a large segment
of a LISP system which will run on %aY%*. Certain primitives will have to be
supplied to run these instructions on %aY%*, but a very large part of LISP can be
bootstrapped from %aX%* to %aY%*.

Given a compiler and interpreter for a language %7L%41%1 we  can also
bootstrap %7L%41%1 to a  language %7L%42%1. We express the interpreter 
for %7L%42%1 as a program in %7L%41%1. We can then execute programs in %7L%42%1
by interpreting the %7L%42%1 interpreter. We can improve efficiency by
compiling the %7L%42%1 evaluator.     Perhaps we can express the %7L%42%1
compiler in %7L%41%1 or %7L%42%1; in either case we can then compile that compiler.

This chapter  will deal with the implementation of the control structures
of LISP. 
We will describe implementations of these features as we develop
compilers for LISP.
This use of compilers has several motivations:

.BEGIN INDENT 0,3;
.GROUP;
%21.%*  To describe the compiler we must carefully define the machine
which will execute the instructions which the compiler produces.
The instructions generated by the compiler will reference the control primitives of the 
machine, so we might as well build a compiler at the same time we
show the dynamic structure of LISP.
.APART
.GROUP;

%22.%*  The design of the compiler  shows another non-trivial application
of non-numerical
computation. At the same time we will see how simple it is to
describe such complex algorithms in LISP.
.APART
.GROUP;

%23.%*  It will clearly show the relationship between compilation and
evaluation.  That is, the LISP function representing the compiler
will very closely parallel the structure of the interpreter, %3eval%*.
If you understand %3eval%*, then the compiler is easy.
.END


As in the previous chapter, we will  remain as abstract
as possible without losing the necessary details. 
A meaningful
description of compilers entails an understanding of a machine,
so before the actual construction of the compilers, we will
describe a simple machine
with a sufficient instruction set to handle the control structures of LISP.
First we will review and expand the primitives of {yonss(P211)}, emphasizing
their interpretation as machine instructions.
.SS(Primitives for LISP,,P230:)
In our discussion of the evaluators in {yonss(P209)} through {yonss(P211)}
we uncovered more details  involved in evaluation of LISP
expressions. In the final evaluator we identified a dozen or so actions.
The  underlying
idea was to remove recursion and replace that implicit control structure
with  very explicit actions, controlled by a simple loop:

.BEGIN TABIT3(30,38,40);GROUP;SELECT 3;

\loop <= λ[[]prog[[]
\\l\cont[]
\\\go[l] ]]
.END
The variable %3cont%1 was a functional variable, bound to states
and set to the next state by the action of the current state.
This observation marks the beginning  of a machine description. What remains
to be done is to separate the actions of the machine from the instructions
it is executing. That is, some of the details of the state transformations
deal with the  bookkeeping which the machine is doing to discover what the
expression is, and some of the transformations perform the actual evaluation
of the expression. For example, the manipulation of  %3fun%1 and %3args%1
is part of the activity to discover the form of the expression.
The execution of %3send%1 and %3receive%1 are involved with the evaluation.
The parts of the evaluator  involved with the execution of the expression
are called  the instructions of the machine. Supplied with an appropriate
execution device, a sequence of these instructions captures the 
meaning of the evaluation of an expression. It is the business of this section
to review the  evaluators and extract a sufficient set of instructions.
We begin that task with some examples, using %3peval%1 of {yonss(P211)} as
the basic interpreter.

First, the evaluation of a constant %3A%1 involves the recognition that
we have seen a constant; that is part of the control of the evaluator.
We evaluate that constant by %3send[denote[]]%1. The %3denote%1 operation
is still part of the evaluator, but the %3send%1 operation is an instruction.
The execution of %3send[A]%1 performs the evaluation. The %3restore%1 operation
returns the evaluator to its previous state. We must allow for some state-saving
in our repertoire of instructions. The evaluation of a function application,
like %3g[A]%1, involves the evaluation of %3A%1, the calling of %3g%1, and 
a means of returning to the computation surrounding %3g[A]%1. 
Function calls involve several things: we need space to contain the evaluated 
arguments; we need a control mechanism to describe which argument is being
evaluated; we need to suspend a  computation such that we can
execute the function with the evaluated arguments; and we must be able to 
return to the suspended computation when the function has completed its task.

The necessary ingredients are already present in %3peval%1; we need only
extract and package them. Clearly %3alloc_dest%1 is involved in
getting new space for the evaluated arguments. There is a second required
activity since %3alloc_dest%1 always occurs in conjunction with
a %3save%1 of the current %3dest%1. Therefore we define an instruction
named %3alloc%1 which saves the current destination %2and%1 intitializes
a new %3dest%1 block.

Each slot of the destination block is filled by an appropriate %3send%1
operation. Examination of the sub-states of %3evalargs%1#({yon(P231)}) 
reveals another
machine instruction: %3next[]%1 is used to increment the destination pointer.

Finally, after all the arguments are evaluated, we must make the destination
block the local environment, and call the function. Thus two more instructions:
%3link%1 will attach the destination block as the local environment, and
and restore the previous dest block;
%3call%1 will call the function, after saving sufficient control information
so that we may return after execution of the function is completed.

For example, consider %3f[g[A];h[B]]%1. Assuming %3f%1 and %3g%1 are 
λ-definitions with formal parameters %3[x;y]%1 and %3[z]%1 respectively, and
%3h%1 is a primitive, then an instruction sequence might be:

.BEGIN GROUP;SELECT 3;nofill;

alloc[(X Y)]; alloc[(Z)]; send[A];
                    link[]; call[G]; next[];
                           alloc[(G1)]; send[B]; link[];
				        call[H]; link[]; call[F]
.END

There are two classes of instructions to break the sequential
flow of a sequence of instructions: we transfer control when we call or
return from a function; and we transfer control when we execute a conditional
expression.

.P241:
Examination of %3ev2%1, %3ev5%1, and %3ev6%1 ({yon(P232)}) 
reveals some of the details of a function call-return sequence.
After saving the current environment, restoring the saved destination, and
saving a continuation point, we passed control to the body of the function.
The instruction sequence, representing the body of the function, will
be terminated by a call on %3ret%1. This instruction will
restore the saved environment and return control to the instruction immediately
after the %3call%1. The saved information is governed by  the variable
named %3control%1, with %3call%1 adding information, and %3ret%1 removing
information. Before showing how instructions are executed and how %3control%1
is manipulated, we will describe the primitives for conditional expressions.

Examination of the details of %3evcond%1 and its associated functions
({yon(P233)}), exhibits more instructions. We use the evaluator
to evaluate a predicate; we then %3receive%1 the result from the  %3dest%1-block.
If that answer is true, we evaluate one path; otherwise we evaluate
another path. We see two instructions here: a test and jump instruction,
which we shall call %3receive_test%1, which tests the  contents
of the current destination slot and jumps to one instruction sequence
if the result is true, and jumps to (usually) another instruction sequence
if the result is false. The second instruction is a means of jumping
 unconditionally to a prescribed instruction sequence. This second instruction is
named %3goto%1.

.GROUP;
.P236:
For example, a conditional expression [p%41%1 → e%41%1; ...;p%4n%1 → e%4n%1]
has a code sequence like:
.BEGIN TABIT2(25,28);GROUP;
\\<instructions for p%41%1>
\\[%3receive_test%1[] → <code for e%41%1>;%3goto[a0]%1]
\\<instructions for p%42%1>
\\	... 
\\<instructions for p%4n%1>
\\[%3receive_test%1[] → <code for e%4n%1>;%3goto[a0]%1]
\\%3err[NO_TRUE_COND_CLAUSE]
\a0  . . .
.END
.APART

Whenever  %3receive_test%1 is true we execute a sequence of instructions
and then transfer out of the conditional using the %3goto%1.
We could have treated conditional expressions like special function
calls, saving %3a0%1 as the continuation and restoring it from %3control%1
instead of using %3goto%1.
However conditional expressions don't require that extra generality⊗↓Our
treatment of conditionals is an instance of "open coding" a function call.
That means we replace a possible %3call-ret%1 with the "in-line"
instruction sequence which makes up the body of the function. This
trick gives faster execution, but takes more space. We will see
another instance of "open-coding" when we compile macros in {yonss(P57)}.←.

We can now give a more detailed picture of a device which can execute this
instruction set. A program will be represented as a sequence of 
instructions. Some of these instructions may be prefaced with labels.
These labels either represent function names or names used within a conditional
expression. Given a sequence of instructions named %3inst_seq%1, we expect
that they be executed in sequence, unless some transfer of control occurs.
For example, the following program  suffices for the execution of such
instruction sequences:
.BEGIN TABIT3(17,33,35);GROUP;SELECT 3;TURN OFF "←";

\loop <= λ[[inst_seq]prog[[i_s;pc]
\\\i_s ← inst_seq;
\\l\[null[i_s] → return[%9`%3halt]]
\\\pc ← first[i_s];
\\\i_s ← rest[i_s];
\\\[is_label[pc] → NIL; %et%3 → pc[]]
\\\go[l] ]]
.END
.BEGIN TURN OFF "←";
If %3loop%1 returns %3HALT%1, then the result of our computation is found
in %3dest%1.
Labels are not executable instructions, and are therefore ignored.
The effect of %3goto%1 is  to replace the current instruction sequence
with the sequence which begins immediately after the label which is the
argument to the %3goto%1. The effect of %3call-ret%1 is a bit more complex.
We describe  %2only%1 the control aspects of %3call%1, leaving the
other details until later.
Let an instance %3call[fn]%1 be the current instruction; and let
%3is%9'%1 be the current instruction sequence. Note that %3is%9'%1 is the
sequence immediately after the call. We save %3is%9'%1 on %3control%1
by %3control#←#concat[is%9'%3;control]%1; then we set %3i_s%1 to the sequence
beginning at %3fn%1. Execution of %3go[l]%1 sends us to label %3l%1 
and we begin executing the body of %3fn%1.

We leave %3fn%1 by executing %3ret%1. This instruction performs
.BEGIN CENTER;
%3i_s#←#first[control];#control#←#rest[control]%1;
.END
and we are back at the instruction following the %3call%1.
.END
Part of the execution of %3call%1 and %3goto%1 involves locating
the desired label. Since we have saved the original instruction sequence
we can search that list for the desired label. We will see more
effective ways for locating labels in {yonss(P7)}.
.SS(%aSM%*: A Simple Machine,%aSM%*,P33:)
This section  describes a simple machine which
has a sufficient instruction set to describe the LISP primitives
in terms of a more conventional machine⊗↓See also [Deu#73].←.
Note that this machine is %2not%* necessary for our
understanding of %3eval%*. %3eval%* is self-descriptive. We need  describe
a machine only to discuss implementation and compilation.  This indeed,
 is an objection to describing meaning of programming languages
in terms of a compiler: you must then understand %2two%* things, the
language %2and%* a machine.

The simple machine, %aSM%*, has a  similarity to the organization of the
PDP-10 ⊗↑[DEC#69]↑. We need very few features to  discuss the interesting 
facets of the implementation
of our primitives. Certainly, if we were to undertake a real implementation, many
more instructions would be desirable. Similarly, when we discuss compilation
our %aSM%* suffices, but if we wished to perform %2efficient%* compilation
we would expect to have a better instruction set.  The point here
is to understand basic algorithms. When that is accomplished it is 
reasonable to examine problems of efficiency and details of implementation.
We address some  of the techniques available for optimization of compiler code
in later sections.

%aSM%* has a conventional addressable main memory, including  registers,
%3AC1, AC2, ..., ACn%1. These registers, called %2accumulators%*,
can either be used as special pointer registers or addressed as
memory locations %30%1 through %3n%1.
Each memory location  is assumed to be large enough to contain
two addresses.
For sake of discussion, assume the word size is 36 bits.
The mapping of a dotted-pair onto an %aSM%* location is straightforward: the %3car%*
maps to the left-half of the word; the %3cdr%*, to the right.
The addressing space for dotted pairs is therefore 2%818%*. 
A memory area is set aside to contain such dotted pairs.
A memory area is also dedicated to  full-word space;
all p-names and numbers are stored there.

Parts of %aSM%* memory can be designated as stacks. Each stack is a contiguous
area of memory, and the current top of a stack is referenced  by one of the
registers, %3P1, ..., Pj%1; these registers are called  
%2stack-pointers%*⊗↓On the PDP-10
a stack pointer  must be one of the %3AC%1 registers.←.
The stacks will be used to contain the partial results of calculations and
will contain the information necessary to implement  the return from
recursive functions.
In most of the compilers we discuss, a single stack  suffices for saving
partial computations, environments, as well as control information. This
single stack will be referred to by %3P%1.


There are only three  classes of instructions necessary to describe
our implementation: instructions for constant generation, instructions for
stack manipulation, and instructions for flow of control.

The control instructions and some of the stack instructions refer to the 
program counter of %aSM%*. This counter is designated as %aPC%*. 
In the following, %3C%1
means "contents of..."; %3ac%* means any accumulator; %3loc%* means
either a memory location or an %3ac%*; and %3mem%* is reserved for memory
references⊗↓Again, in a real PDP-10 %3mem%1 can be an %3ac%1.←. 

.GROUP;
Here are the  instructions:

.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;

MOVEI ac const\C(ac) ← const

PUSH P ac\C(P) ← C(P)+1. %1Increment stack pointer%*
\C(C(P)) ← C(ac).%1 Copy contents of %3ac%*  onto top of stack.%3

POP P ac\C(ac) ← C(C(P)). %1Copy top of stack into %3ac.
\C(P) ← C(P)-1. %1Decrement  stack pointer.%3


.END
.APART

.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
%1The next two instructions are  used in function call-return situations.%*

PUSHJ P loc \C(P) ← C(P)+1. %1Increment stack pointer%*
\C(C(P)) ← C(%aPC%*).%1 Place address following the %3PUSHJ%1 in the stack.%3
\C(%aPC%3) ← loc. %1 change   control to location %3loc%1.

%3POPJ P \C(%aPC%3) ← C(C(P)). %1Copy top of stack into %aPC%1.
\%3C(P) ← C(P)-1. %1Decrement  stack pointer.%1

.END
.BEGIN FILL;SELECT 1;
We have ignored some of the details of stack operations;
each stack operations must consider boundary conditions
on the storage allocated for the stack.
Any condition which would violate these bounds must be detectable.
If a stack is allocated in a discontinuous fashion#(⊗↑[Bis#74]↑)
then a storage management decision must be made; if the stacks are of fixed 
size, then an error must be signaled.
.END
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
.BEGIN INDENT 0,19;FILL;
%3MOVE ac loc \C(ac) ← C(loc).%1
This is an instruction to load a specified %3ac%* with the contents 
of %3loc%*. Note %3loc%1 may be an %3ac%1; e.g. %3MOVE#AC%41%3#AC%42%1.
.END

.BEGIN INDENT 0,19;FILL;
%3MOVEM ac loc \C(loc) ← C(ac).%1 Copy contents of %3ac%1 into %3loc%1.
For example, %3MOVEM#AC%41%3#AC%42%3#=#MOVE#AC%42%3#AC%41%1.
.END
.END
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
.P153:
%3SUB ac loc\C(ac) ← C(ac) - C(loc).

%3JUMP mem\C(%aPC%*) ← mem. %1Go to location %3mem%*.

%3JUMPF ac mem\%2if %3C(ac)=%ef%1 %2then%3 C(%aPC%*) ← mem;

.BEGIN INDENT 0,19;FILL;
%3JUMPT ac mem\%2if %3C(ac)%c≠%*%Ef%1 %2then %3C(%aPC%*) ← mem.
%1Note that %3JUMPT%1 implements the
coding trick of {yonss(P214)} which maps %et%1 onto everything which is
not false.
.END

.END

These instructions are executed by a machine whose basic execution cycle is
something like:
.BEGIN TABIT2(15,18);GROUP;SELECT 3;TURN OFF "←";
\%al:%3\C(%aIR%3) ← C(C(%aPC%3))
\\C(%aPC%3) ← C(%aPC%3) + 1
\\ %aexecute %3C(%aIR%3)
\\%ago to l
.END
The %aIR%*, or %aInstruction register%1, is an internal register 
used to hold the instruction we are executing. Note that the %aPC%*
register is incremented %2before%* execution of the instruction. If we
incremented %aPC%1 %2after%1 the execution
of the instruction, and the instruction were a JUMP-type
instruction, then the %aPC%1 would get a spurious incrementation.

.P167:
A critical part of LISP evaluation involves procedure calls and returns.
Since we expect to handle recursive calling sequences,
the %3call-ret%* pair ({yon(P241)}), represented 
as %3CALL%1 and %3RET%1, must take this into account. However, there is a
more fundamental requirement of this pair: they must make sure that, on completion of
a %3CALL%*, the %3RET%* can return to the instruction which directly follows the
%3CALL%*.
This requirement can be accomplished by a less comprehensive call, say %3JSR%1,
(for Jump SubRoutine),
which stores the current value of the %aPC%* in a known location. Then the
return, %3JRTH%1, (for Jump THrough),
 need only pick up that saved value and restore it into the
%aPC%*. We could implement this instruction on our machine.  Recall that in the
basic machine cycle  the %aPC%* was incremented %2before%* the
execution of the instruction. Thus if we were  about to execute a %3JSR%1
the %aPC%* is already pointing at the next instruction; all we need to do
is save the current %aPC%*.
So let's assume that %3JSR F%1 stores the %aPC%* in %3F%* and begins execution
in location %3F+1%1. Then:

.BEGIN TABIT1(19);TURN OFF "←";

%3JSR F\%3C(F) ← %3C(%APC%3)%1. Save the %aPC%* in %3F%*.
\%3C(%aPC%*) ← F + 1.%1 Jump to location represented by %3F + 1%*.

%3JRTH F%* \%3C(%APC%*) ← C(F). 
.END

This pair is indeed how several languages perform their calling sequences.
It's fast and efficient; however it is not sufficient for recursive control.
If we always store in a %2fixed%* location, only the 
result of the %2last%* store would be
available and previous values set by prior recursions would have been lost.
What we need is an implementation of the actions of %3control%1.
For purpose of our discussion we can assume that  %3control%1 operates in a 
stack-like fashion⊗↓%1Unless we wish to consider extensions of LISP,
a stack is  sufficient for LISP's control environment.←.
What the %3CALL%* will do is %2push%* the current contents of the %aPC%* onto the
control stack; and %3RET%* will pop off the top element and put it into the %aPC%* 
register⊗↓What 
will be found on the control stack  is a time-sequence of
those procedures which have been entered but have not yet been completed.
Such information is exceptionally useful in debugging programs.←.

The behavior we have just described is that attributed to the %3PUSHJ-POPJ%1 
pair when they are applied to the control stack. We have separated out
the %3CALL-RET%1 pair since the calling process is not always as simple
as %3PUSHJ-POPJ%1. Several things impinge on our decision:

.BEGIN INDENT 7,7;
%21.%1 We want to be able to supply detailed debugging information to the
user. How this will be accomplished will be the topic of {yonss(P168)}.

%22.%1 We want to be able  to freely replace functions
with new definitions. A %3PUSHJ%1 goes to a particular sequence of instructions.

%23.%1 We want to be able to intermix compiled and interpreted programs.
Compiled programs may call interpreted 
programs, and vice versa. Indeed we may even
wish to replace an interpreted (compiled) definition with a compiled (interpreted)
version.

%24.%1 In dealing with functional arguments, we must be able to transfer
control to a function variable. We cannot know where the %3PUSHJ%1 should
transfer.

.END


When an interpreted function calls  a compiled (or primitive)
function, %3eval%* will  look for the indicator,  %3SUBR%*; then retrieve the
machine address of the code and enter via a %3PUSHJ%*. That code should exit
(back to %3eval%*) via a %3POPJ%*, after assuring that any stacks have
been appropriately restored.

.P243:
Compiled functions  call other functions  via  %3CALL%1.
The %3CALL%*  must discover how to call the function: is it a 
%3SUBR, EXPR%*, an %3FEXPR%1,  etc?  The  function is  called  and  
on completion control  is 
returned to  the address  immediately following  the %3CALL%*.
For example, %3(CALL fn)%1  can be implemented as %3(PUSHJ P DECODE)%1,
where %3P%1 represents the control stack pointer, and %3DECODE%1 represents
a routine to decode the actual procedure call. Within %3decode%1 we
know that %3C(C(P)-1)%1 is the actual call instruction; %3decode%1 then can
access the function definition associated with %3fn%1, set up the call, and
then return via a %3POPJ%1.

Within any %3CALL%1 or %3PUSHJ%1 we may call any function, including
that function itself.
This brings us to  one of the most important conventions
for %2any%* stack-like call-return sequence:
Whatever
we push onto a stack within the body of a function %6must%* be
popped off before we exit from the function body. That is, the state of any stack
must be transparent to any computations which occur within the function.
This is called %2⊗→stack synchronization↔←%*.

Usually the effect of %3RET%1 is identical to %3POPJ%1, however it is
conceivable that we might expect that complex returns require
special care.  The basic idea  in this discussion is that we will
supply two similar, compatible, but not identical call-return sequences;
%3PUSHJ-POPJ%1 is fast and simple. The other, %3CALL-RET%1 is more general but more costly
to invoke.


In the next section we
will reconcile LISP primitives with the instruction set supplied
on %aSM%1.
.SS(Implementation of the Primitives,,P234:)
As with any representation problem, several choices are available.
We will begin our use of  %aSM%* with a study of call-by-value function calls;
later we will discuss  other calling sequences.
We will discuss two general implementation techniques. The first is applicable
to machines without the special %3AC%1's of the %aSM%1.

.P244:
First, we will assume only that we are able to simulate a stack. All the
operations occur on the stack. Constants will be generated by pushing the
representation on the top of the stack, essentially creating a %3dest%1 block.
A function call, %3f[t%41%3;#...#;t%4n%3]%1,
expects its arguments as the top %3n%1 elements of the stack, 
with the value of %3t%4n%1 on the top of the stack, and the other values
below.
As the function is called, the %3dest%1 block on the top of the stack
becomes the local environment. The function replaces the top %3n%1 elements
 with the value of the function, thus %3send%1-ing its value to the destination of
the caller. 
This model is  a restricted subset of  LISP, but it is 
a very useful subset.
It will develop into a more robust example as the chapter progresses.
The  technique is extendible to support the implementation 
model we developed in {yonss(P148)}.

.GROUP;
Here's an example of the implementation for the expression %3f[g[A];C;h[B]]%1:
.BEGIN TABIT2(10,45);SELECT 3;GROUP;

\ (PUSH P (QUOTE A))\%1; make argument for call on %3g
\ (CALL  G)\%1; call the function%3
\ (PUSH P (QUOTE C))\%1; place second argument%3
\ (PUSH P (QUOTE B))
\ (CALL H)\%1; %3h%1 only uses (and removes) %3B
\ (CALL  F)\%1; after the call, %3f[g[A];C;h[B]]%1 is 
\\;     on the top of the stack.
.END
.APART

Now we will give implementations of the LISP primitives which result
in reasonably efficient code on the %aSM%1, and which also
reflect several practices applied in current LISP implementations.
We will take advantage of the existence of the special %3AC%1's;
the usual hardware implementation of such special registers
allows access to their contents in less time than typical
stack references⊗↓There is a question whether such special registers 
should be considered good architecture or a trick. The Burroughs
6700-7700 uses special hardware to decrease access time to the initial
stack segment. The PDP-10 uses special registers. One can
argue that such special tricks belong in the hardware, and that the
machine presented to the programmer be correspondingly more
uniform ⊗↑[Dor#76]↑.←.

Since the creating, saving, and restoring of destination blocks can be expensive,
we will try to minimize those kinds of activities. We will use
our special registers %3AC1%1, through %3ACn%1 to build parameter lists
and pass parameters. This  entails
several conventions.
We will try to use the accumulators as the destination block.
Our early  compilers will be sufficiently weak that this desire can
be met. Later we will have to modify our stand slightly.

The actual parameters  for a function call,
%3f[t%41%3;#...#;t%4n%3]%1, will be developed in %3AC1%1 through %3ACn%1.
In the early compilers we will also pass the evaluated parameters
to the called function using the accumulators. Thus values will tend to stay
in the %3AC%1's unless forced out. They can be forced out by %3alloc%1
since a call to %3alloc%1 is supposed to save the current %3dest%1.
The interplay between %3next%1, %3link%1, and %3send%1 requires care.  

.P235:
We will assume
that we are compiling for single valued functions, and therefore 
we must resolve the question of where to put the value of a function.
Again consider
%3f[t%41%3;#...#;t%4n%3]%1; we might expect that each %3t%4i%1 be responsible
for placing its result in the proper %3ACi%1. Indeed that is the spirit
of the %3send%1 operation; it knows where the result should be placed.
This strategy requires some careful register allocation if it is to
be carried out successfully. We will postpone this discussion for a while.

There is a simpler solution available: a function always returns its
value in %3AC1%1 and leaves  the register allocation up to the
calling function. There are at least two strategies here:


.GROUP;
.BEGIN CENTERIT;SELECT 2;

←Convention 1
.END
We try to build the %3dest%1 block in the %3AC%1's and also use the %3AC%1's
to pass parameters and values.

.BEGIN INDENT 10,10,10;
A function call, %3f[t%41%3; ...;t%4n%3]%1, expects
its arguments to be presented in %3AC1%1 through %3ACn%1. 
We try to compute the values of %3t%4i%1 directly in %3ACi%1.
This is easy if %3t%4i%1 is a constant; if %3t%4i%1 is a function call
on %3g%1, we save %3AC1%1 through %3ACi-1%1; set up the arguments
to %3g%1; perform the call, returning the result in %3AC1%1; move the
result to %3ACi%1; and restore the saved values of the %3t%1's.
.END
.APART
.GROUP
.BEGIN CENTERIT;SELECT 2;

←Convention 2
.END
We try to build the %3dest%1 block in the top of the stack, using the
%3AC%1's for passing parameters and returning values.

.BEGIN INDENT 10,10,10;
A function call, %3f[t%41%3; ...;t%4n%3]%1, expects its arguments to
be presented in %3AC1%* through %3ACn%*.
As we compute each %3t%4i%1, we store
the result on the stack  %3P%*. 
Thus the execution sequence should be: 
.END

.BEGIN centerit;
←compute value of %3t%4%1, push onto stack %3P%*.

← .  .  .
←compute value of %3t%4n-1%1, push onto stack %3P%*.
←compute value of %3t%4n%1, move into %3ACn%*.
.END
.APART

.BEGIN INDENT 10,10,10;GROUP
After this computation the values,
V%4n-1%*, ..., V%41%*, 
of the arguments are stored
from top to bottom in %3P%* with V%4n%1 in %3ACn%1.
Thus to complete the function invocation, we need only pop the arguments
into the %3AC%*'s in the correct order and call %3f%1.
We did not push V%4n%1 since we expected to pass the parameters to %3f%1
in %3AC1%1 through %3ACn%1.
.END

.BEGIN INDENT 10,10,10;GROUP
.BEGIN CENTERIT;SELECT 2;

←General conventions
.END
.P174:
 When a function completes evaluation, it is to place its value
in %3AC1%*. Nothing can be assumed about the contents any  other %3AC%*. If an 
%3AC%*  contains information we need then it must be saved on the 
stack %2before%*  calling the function.


Instead of referring to %3AC1, AC2, ..., ACn%* we will simply
use the numbers, %31, 2,#...,#n%* in the instructions.
.END


We give an example of both conventions
for the expression %3f[g[A];C;h[B]]%1.
We use a list representation of the instructions and code sequences in preparation
for future discussions.

.BEGIN TABIT2(10,45);SELECT 3;GROUP;


\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL  G)\%1; call the function%3
\ (MOVEI 2 (QUOTE C))\%1; place second argument%3
\ (PUSH P 1)\%1; but now we have to save the values%3
\ (PUSH P 2)\%1;  since we must compute %3h[B]
\ (MOVEI 1 (QUOTE B))
\ (CALL H)
\ (MOVE 3 1)\%1; move the result to %3AC3
\ (POP P 2)\%1; restore %3AC2%1 and %3AC1%1 in the correct order%3
\ (POP P 1)
\ (CALL  F)  )
.CENTER;SELECT 2;
Convention 1
.END

.BEGIN TABIT2(10,45);SELECT 3;GROUP;


\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL  G)\%1; call the function%3
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE C))
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE B))
\ (CALL H)
\ (MOVE 3 1)\%1; don't need to save the value%3
\ (POP P 2)\%1;     since this is the last argument.%3
\ (POP P 1)
\ (CALL  F)  )
.CENTER;SELECT 2;
Convention 2
.END
Neither compiling convention produces optimal code for all occasions.
If the parameter list to a function call contains only constants, 
then the first convention
produces better code. If there are many nested function calls then
it may produce very bad code. We will worry more about efficiency after we
develop the basic compiling algorithms.

At the highest level, our compiler will generate code for the
%3alloc-link-call-...%1 machine. But frequently we will express the
code in terms of one of our more traditional representations.

The output from the compiler
is to be a list of instructions,  in the order which we
would expect to execute them.  Each instruction is a list: an operation
followed by as many elements as are required by that operation.
We can execute the compiled code by simulating the actions of our machine
on each element of the sequence. However it is more efficient to 
translate this compiler output further, producing a sequence of actual 
machine instructions, placed in memory and suitable for execution by the
hardware processing unit.
We  will allocate an area of memory which can receive the  processed
compiler output.
This area is usually called %2⊗→Binary Program Space↔←%* (BPS).  
The translation program
which takes the  output from the compiler and converts it into actual
machine instructions in    BPS   is called an assembler. 


.SS(Assemblers,Assemblers,P7:)
In {yonss(P230)} we gave an abstract description of an algorithm for
executing sequences of instructions.
In this section we discuss the mechanism for getting the LISP list, representing
instructions, turned into real instructions in Binary Program Space.
Part of the process involves the actual instructions; 
before a machine can execute an instruction it must be transformed into a
numerical code which the machine understands.
Part of the process involves
establishing a link between the code in memory and the LISP evaluator;
when calling a routine which has been compiled, the evaluator must know where
to find it. Therefore, part of the process involves mapping the the labels to
locations in the machine.

There are two  alternatives available to solve these problems.
We might incorporate these operations into the compiler. Then the
output from the compiler would go directly to locations in  BPS.
We invoke this solution in {yonss(P240)} when we combine compiling
and interpreting. With the traditional approach, direct loading of compiled
code would  require re-compilation  whenever we wished to initiate a new
LISP process which needed that function.

The alternative is to compile the functions onto an external medium; then
we can load the compiled code at any time.
The premise here is that
compilation is an expensive operation relative to the cost of loading
compiled code.
In this alternative scheme, the tasks of loading, locating labels, and
 linking the code with other modules, are all 
accomplished by  a program called an %2assembler%1. After the assembler has
placed the decoded assembly language in BPS, it indicates that the value
of the function name is the location of the assembled code.

One of the arguments to the assembler should be the  representation
of the program.  One of its arguments should also describe where  in ⊗→BPS↔← 
we wish the assembled code to be located. We should also have access
to an initial symbol table, describing the  pre-defined
symbol names.  These pre-defined names might include information
about the actual machine locations for the utility functions, the values
of special stacks or registers which the compiler uses internally, and
it must also include a special list giving a correspondence between
the names like %3ALLOC%1 or %3PUSHJ%1 and the actual numbers which the hardware
uses in interpreting the instruction⊗↓A hardware machine is just another
interpreter like %3eval%1. It is usually not recursive, but performs more
like %3loop%1 in the %3prog%1-evaluator.←.

The assembler  can go %3rest%*-ing down the program list, looking up
definitions and manufacturing the numerical equivalent of each 
instruction, then depositing that number in the appropriate machine
location.  

.GROUP
Below is a low level representation of a code sequence for
%3f[g[A];h[B]]%* assuming that %3h%1 is a primitive routine.

.P169:

It might be a list of the form:
.BEGIN TABIT2(10,45);SELECT 3;
\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL  G)\%1; call the function%3
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE B))
\ (PUSHJ P H) ⊗↓%1Note that we use %3P%1; this assumes we use %3P%1 to save %2both%1 value and control information.←%3
\ (MOVE 2 1)
\ (POP P 1)
\ (CALL  F)  )

.END
.APART
%1

The machine representations of these instructions are encodings of
specific fields of specific machine locations with specific numbers.
For example, the operation %3PUSH%* is represented as a certain number,
called its %2⊗→operation code↔←%* or %2op code%*, and which will
occupy a certain area of a machine word so that the CPU can interpret
it as an instruction to push something onto a stack.  Other fields in the
instruction are to be interpreted as references to stacks, to memory locations,
to accumulators, constants or external references to other routines.
The purpose of an assembler is to translate these mnemonic instructions
into machine instructions.

Essentially all that the assembler need do is search ⊗→symbol tables↔←
for the opcodes, for subroutine names, for accumulator and stack names,
and store the resulting values in the appropriate machine locations.
Things are slightly more complex: later we must also %6add%* information to
the tables.

We must exercise a bit of care in handling  %3QUOTE%*d expressions.
Assembling a construct like %3(MOVEI 1 (QUOTE (A B C)))%* should
have the effect of constructing the list %3(A B C)%* in free space
and placing an instruction in memory to load the address of this list into %3AC1%*.
What we must notice is that this list %3(A B C)%* is subject to
garbage collection and, if left unprotected, could be destroyed.
There are a couple of solutions. Perhaps the garbage collector could
look through compiled code for any references to free-space or full-word-space;
or we could make a list of all of these constants and let the garbage
collector mark the list.
Looking through compiled code is expensive; keeping a %3QUOTE%*-list
is a reasonable compromise. It is a compromise since that strategy
might retain unnecessary structures in case functions were redefined or
recompiled.

The assembler also needs to recognize that there are different instruction formats.
That is, some instructions use an opcode and a memory reference: %3(JUMP#L)%*;
some use an opcode, accumulator, and an address: %3(PUSH#P#1)%*; and some
use a LISP construct: %3(MOVEI#1#(QUOTE A))%*.
Therefore, the assembler 
has to have an initial symbol table of opcodes  and
stack numbers.

.BEGIN TABIT2(35,45);GROUP;
Here is a sample op-code table with their machine equivalents:

%2\symbol\value%3
\MOVE\200
\MOVEI\201
\SUB\274
\JUMP\254
\JUMPE\322
\JUMPN\326
\PUSH\261
\POP\262
\PUSHJ\260
\POPJ\263
\RET\263
\CALL\034
\P\14
.END

And here's what the above example might resemble after being assembled:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";

		        ⊂αααπααπααααα⊃
		  100   ~201~ 1~  405~
		        εαααβααβαααααλ
			~034~  ~ 1107~
		        εαααβααβαααααλ
			~261~14~    1~
		        εαααβααβαααααλ
			~201~ 1~  406~
		        εαααβααβαααααλ
			~260~14~11121~
		        εαααβααβαααααλ
			~200~ 2~    1~
		        εαααβααβαααααλ
			~262~14~    1~
		        εαααβααβαααααλ
			~034~  ~ 1051~
		        %ααα∀αα∀ααααα$

.END

where %3A%1 is located at %b405%1; the atom %3F%1 begins at %b1051%1, 
and the instruction sequence for %3h%1 begins at %b11121%1, etc.
.SS(Compilers for Subsets of LISP)
We will examine  compilers for increasingly complex subsets of LISP
beginning with functions, composition and constant arguments and
ending with a more realistic compiler for a reasonable subset of
pure LISP. Though each subset is a simple extension of its predecessor,
each subset introduces a new problem to be solved by the compiling algorithm.
If the corresponding evaluator (⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔←, and the most
simple ⊗→%3eval%*↔←) is well understood, then the necessary steps
to be added to the compiler are easy to make.



The first compiler will handle representations of that
subset of LISP forms defined by the following BNF equations:

.BEGIN TABIT1(11);GROUP

<form>\::= <constant> | <function>[<arg>; ...; <arg>]

<arg>\::= <form>

<constant>\::= <sexpr>

<function>\::= <identifier>

.END

This LISP subset corresponds closely to that of ⊗→%3tgmoaf%*↔←, handling only
function names, composition, and constant arguments.
In the interest of readability and generality, we will write the functions using
constructors, selectors, and recognizers and supply the necessary
bridge to our specific representation by simple sub-functions.

All the compilers we develop will be derived from the second compiling
convention, saving the results on the top of the stack. Our compilers will
incorporate some knowledge of the %aSM%1 machine, and we will try to
note places where substantial assumptions about machine stucture have been
made.

.GROUP

.FILL
%3compexp%1 expects to see either a constant or a function followed by a
list of zero or more arguments. The appearance of a constant should
elicit the generation of a list containing a single instruction to %3send%1
back  the representation of that constant; %3mksend[dest;exp]%*
is a call on the constructor to generate that instruction.
Since values are always found in %3AC1%1, that should
be the destination for the %3send%1. Since we are assuming the expression is a
constant, the operation can be a %3MOVEI%1.


If the expression is not a constant,
we can assume it is a call-by-value application. We should 
generate code to evaluate each argument, 
and follow that with code for a function call.

.BEGIN SELECT 3;TABs 11,18,21,35;turn on "\";nofill;

%3compexp <=\λ[[exp]\[isconst[exp] → list[mksend[1;exp]];
\\ %et%* →\λ[[z] compapply[\func[exp];
\\\\complis[z];
\\\\length[z]]]
\\\  [arglist[exp]] ]]]%1

.END
.APART
.GROUP


.FILL
%3complis%1 gets the representation of the argument list; it must
generate a code sequence to evaluate each argument and increment the destination.
After we have compiled the last argument we should not increment  the
destination.
Notice that we have used %3append%* with three
arguments; this could be justified by defining %3append%* as a macro ({yonss(P8)}).
.BEGIN SELECT 3;TABIT2(14,25);

%3complis <= λ[[u]\[null[u] → ( );
\ null[rest[u]] → compexp[first[u]];
\ %et%* → append[\compexp[first[u]];
\\list[mkalloc[1]];
\\complis[rest[u]]]] ]
.END
.APART
.GROUP
.FILL

%3compapply%1 has a simple task: 
it generates code for allocation of the values; it takes the  list of instructions
made by %3complis%* and adds instructions at the end of the list
to generate a function call on %3fn%*.
Here's %3compapply%*:
.BEGIN SELECT 3;TABIT1(30);
.P82:

%3compapply <= λ[[fn;vals;n] append[\vals;
\mklink[n];
\list[mkcall[fn]]]]

.END
.APART



.SELECT 1;
Finally, here are the constructors, selectors, and recognizers:
.BEGIN TABIT1(16)GROUP;
.P242:

%2Recognizer%3
isconst <= λ[[x]  or\[numberp[x];
\ eq[x;%et%3];
\ eq[x;%ef%3];
\ and[not[atom[x]];eq[first[x];QUOTE]]]]

%2Selectors%3
func <= λ[[x] first[x]]
arglist <= λ[[x] rest[x]]

%2Constructors%3
mksend <= λ[[dest;val] list[MOVEI;dest;val]]
mkalloc <= λ[[dest] list[PUSH;P;dest]]
mkcall <= λ[[fn] list[CALL;fn]]
mklink <= λ[[n][eq[n;1] → ( ); %et%3 → concat[mkmove[n;1];mklink1[sub1[n]]]]

mklink1 <= λ[[n][zerop[n] → ( ); %et%3 → concat[mkpop[n];mklink1[sub1[n]]]];

mkpop <= λ[[n] list[POP;P;n]]

mkmove <= λ[[dest;val] list[MOVE;dest;val]]
.END


Note  that  %3compexp%* is just a
complex %9r%*-mapping whose image is a sequence of machine language instructions.

The code generated by this compiler is inefficient, but that
is not our main concern. We wish to establish an intuitive and
 correct compiler, then 
worry about efficiency. Premature concern for efficiency 
is folly; we must first establish a correct and clean algorithm.

.CENT(Problems)
I. Write %3compexp%1 to generate code for %2option 1%1 as discussed
on {yon(P235)}. Compare the two versions of %3compexp%1; 
now write a more abstract version which encompasses both as special cases.

II. Write  %3compexp%1 and associated functions for a stack-only machine
using the techniques outlined on {yon(P244)}.
.SS(Compilation of Conditional Expressions)


The innovation which occurred in %3tgmoafr%* was the introduction of conditional 
expressions. To describe conditional expressions, the  BNF equations
were augmented by:

←<form> ::= [<form> → <form> ; ... ;<form> → <form>]

The addition of conditional expressions will mean an extra 
piece of code in %3compexp%*
to recognize %3COND%* and a new function (analogous to %3evcond%* in %3tgmoafr%*)
to generate the code for the %3COND%*-body⊗↓If we had designed the compiler
like the
evaluators in {yonss(P237)} we would need only attach a compile-property
to the atom %3COND%1, and make the property-value %3COMPCOND%1.←.
In fact, the only difference between %3evcond%* and its counterpart in 
%3compexp%*, which we shall call %3comcond%*, is that %3comcond%* generates
code which when executed will produce the same value as the value
produced by %3evcond%* operating on the given S-expression.

The effect of %3comcond%* on a conditional of the form:

.P103:
%aCOND%*←%3(COND%* %3(%eR%f(%3 p%41 %f)%3  %eR%f(%3 e%41%f)%3 ) ... (%eR%f(%3 p%4n%3 %f)%3 %eR%f(%3 e%4n %f)%3))

%1 follows from the discussion of %3receive_test%1 on {yon(P236)}.

First generate code for p%41%*; then generate a test for truth, going to the
code for e%41%* if true, and going to the code for p%42%* if not true.
The code for e%41%* must be followed by an exit from the code for the
conditional, and we should  generate an error condition to be executed
in the case that p%4n%* is false.

.BEGIN NOFILL;GROUP;
We  represent the code as:

.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
		∞1<code for p∞41∞1>∞g
     			/\	
      		       /  \ 
  		      ∞fG∞*    ∞fH∞*
    	             T 	   NIL
    		    /        \
		   ∞fG∞*          ∞fH∞*
            ∞1<code for e∞41∞1>∞g       ∞1<code for p∞42∞*>∞g
                ∞fG∞*               ∞fGH∞*
               /               T NIL
              ∞fG∞*               ∞fG∞*     ∞fH∞*          
             / 	     ∞1<code for e∞42∞1>∞g  ∞1<code for p∞43∞*>∞g
            ∞fG∞*              /          ∞fGH∞*
            ∞fH∞*             ∞fG∞*       ...   ...	
             \           /         ∞fG∞*     ∞fH∞*
              ∞fH∞*         ∞fG∞*         T       NIL
               \       /         /         \
                \     /         ∞fG∞*           ∞fH∞*
                 ∞fH∞*   ∞fG∞*     ∞1<code for e∞4n∞1>∞g    ∞1<code for error∞*>∞g
                  \ /        ∞fG∞*                   
                   ∞fH∞*        ∞fG∞*
		     ...   /
                      \   /
		       ∞fH∞* ∞fG∞*
.END

.END

We will express %3comcond%1 in terms of %aSM%1 primitives.
This requires the establishment of more conventions for our compiler,
since we must  implement %3receive_test%1.
.GROUP
.BEGIN CENTERIT;SELECT 2;
←More Compiling Conventions
.END

.BEGIN INDENT 10,10,10;
We must be able to test for %et%* (or %ef%*).
Previous conventions imply that the value of a predicate will be found in %3AC1%1.
We can test for the occurrence of %et%1 or
%ef%* using the %3JUMPT%1 or %3JUMPF%* instruction 
(see {yonss(P33)}) respectively⊗↓Note 
that this implementation maps everything not %ef%* onto
%et%1. Thus any value other than
%ef%* will be considered %et%*. See {yonss(P214)}.←.
.END
.APART
We can implement %3receive_test%1 using %3JUMPT%1 or %3JUMPF%1, and
we can implement %3goto%1 using %3JUMP%1.

Since our code is to be a %2sequence%* of instructions,
we must linearize the graph-representation of the generated code.
We can generate a sequence representation by appropriately interspersing labels
and %3JUMP%1s between the blocks of instructions for the p%4i%1's and e%4i%1's.
We will generate:

.BEGIN TABIT2(30,35);GROUP
.P238:
\%3(\%1<code for p%41%*>%3
\\(JUMPF 1 L1)%1
\\<code for e%41%*>%3
\\(JUMP L0)
\L1\%1<code for p%42%*>%3
\\(JUMPF 1 L2)
\        ...
\Ln-1\%1<code for p%4n%*>%3
\\(JUMPF 1 Ln)%1
\\<code for e%4n%*>%3
\\(JUMP L0)
\Ln\(JUMP ERROR)
\L0\   )
.END
%1

We need to construct the labels,  %3Li%*.  These  labels should  be
atomic and should be distinct. LISP has a function, ⊗→%3gensym%*↔←,  which
can be used for this task. %3gensym%* is a function of no arguments  whose
value is  a distinctive  symbol called  a generated  symbol, or  "gensym".
Gensyms are not true atoms since they  are not placed in the object  list;
they are usually used  only
for their unique name.  If it is desired to use them as atoms,
they  must be placed  on the object list  using
the function  %3intern%1 ({yon(P277)}).   Gensyms are  distinct from  each
other and will be distinct  from all other atoms  unless you read an  atom
with that pname⊗↓In many versions of LISP, gensyms are of the form
%3Gn%* where %3n%* is a four digit number beginning at %30000%*. Thus
the first call of %3gensym[ ]%* would give %3G0000%*; succeeding calls would give
%3G0001, G0002%*, etc.←.

We want to  write a recursive version of  %3comcond%*; therefore
we must  determine what code gets generated on each
recursion and what code gets generated at the termination case.

.GROUP;
Looking at the example, we see that the block 

.BEGIN CENTERIT;
←%3(%1<code for p%4i%*> %3(JUMPF 1 Li)  %1<code for e%4i%1>  %3(JUMP L0) Li)%1
.END
is the natural segment for each recursion and that:
.BEGIN CENTERIT;
←%3((JUMP ERROR) L0)%1
.END
is to be expected for the termination case. 
Within each
block  we need a "local" label, %3Li%1; and within each block,
including the terminal case, we refer to the
label %3L0%* which is "global" to the whole conditional.
We can now add the recognizer for
%3COND%* to %3compexp%* and construct %3comcond%*.
.APART

.BEGIN SELECT 3; TABIT2(19,30);TURN OFF"←";GROUP;

%1Add the clause:  
.BEGIN CENTER;SELECT 3;
iscond[exp] → comcond[args%4c%3[exp];gensym[ ]]; 
.END
%1to%3 compexp %1where:
%3

comcond <= λ[[u;glob]\[null[u] → list[mkerror[ ];glob];
\ %et%* → append[\comclause[first[u]; gensym[];glob];
\\comcond[rest[u]; glob] ]
.END

.BEGIN SELECT 3; TABIT1(28);TURN OFF"←";GROUP;

comclause <=λ[[p;loc;glob] append\[compexp[ante[p]];
\ list[mkjumpf[loc]];
\ compexp[conseq[p]];
\ list[mkjump[glob];loc] ]]

.END

.BEGIN  SELECT 3;GROUP;NOFILL;
%2Recognizer%3
iscond <= λ[[x] eq[first[x]; COND]]

%2Selectors%3
args%4c%3 <= λ[[x] rest[x]]
ante <= λ[[c] first[c]]
conseq <= λ[[c] second[c]]⊗↓%1This definition of %3conseq%1
does not allow extended conditional expressions. See  problem II on {yon(P270)}.←%3

%2Constructors%3
mkerror <= λ[[] (JUMP ERROR)]
mkjumpf <= λ[[l] list[JUMPF;1;l]]
mkjump <= λ[[l] list[JUMP;l]]

.END
.SELECT 1;

.BEGIN GROUP;TABIT2(30,38);
The partially exposed recursive structure of %3comcond%* would show:

←%3comcond[((%1p%41%*  e%41%3) ...(%1p%4n%*  e%4n%3));G0000]=

\%3(%1\<code for p%41%*>%3
\\(JUMPF 1 G0001)%1
\\<code for e%41%*>%3
\\(JUMP G0000)
\ G0001\comcond[((%1p%42%* e%42%3) ...(%1p%4n%* e%4n%3)); G0000])

.END

.SELECT 1;
We need to extend our assembler to handle the  generated labels and jumps which
appear in the conditional expression code.


.CENT(Problems)

.BEGIN TABIT2(12,28);
I. Evaluate:\%3compexp[(COND\((EQ (QUOTE A)(QUOTE B))(QUOTE C))
\\((NULL (QUOTE A))(QUOTE FOO)))]

.END
.P270:
II. Extend %3comcond%1 to handle extended conditional expressions.
.SS(One-pass Assemblers and Fixups)
The compiled instructions for conditional expressions
adds one more task for our assembler: 
we must handle the generated label constructs.
Recall the pattern for conditional expression code given on {yon(P238)}.
The code sequence consists of representations of instructions and labels.
The symbols, %3L0, L1,%* and %3L2%* in that example are 
generated symbols, representing labels.  Though the gensyms are %2not%1
true atoms, they %2will%1 satisfy the test for %3atom%1. Therefore
we can recognize an occurrence of a label 
using the predicate  %3atom%*.

If the assembler recognizes a label definition, it should
add information to a symbol table, noting that the label has
been seen and that it is associated with a specific location in memory.
Further references to that label can be translated to references
to the associated machine location.  The only problem is that references
to labels may occur %6before%* we have come across the label definition 
in the program.  Such references are called %2⊗→forward reference↔←s%*.
For example, all references in the %3COND%1-code are forward 
references⊗↓If we scan the instruction  sequence in the order in which the
code would be executed, we always refer to a label before we
come across its definition. We could skirt the forward reference
problem by loading the program in reverse order: rather than beginning with
the first instruction and loading %2upward%1 in memory, we could 
begin with the last instruction and load downward.  However, this
would only be a temporary expedient: an assembler must also handle
%3prog%1s. The label structure of %3prog%1s is not restricted to such
predictable behavior.←.There are two solutions to the 
⊗→forward reference↔← problem:

.BEGIN INDENT 0,3;
%21.%*  Make two passes through the input program.  The first pass
builds a symbol table of labels describing
where the labels will be assigned in memory.
A second pass
uses this symbol table to actually assemble the code into the machine.

%22.%*  The other solution is to make one, more complex, pass through the input.
Whenever we come across a forward reference we add information to the symbol table
saying that a forward reference has occurred at this location.  We assemble
as much of the instruction as we can.  When a label definition %6does%*
occur, we check the table to see if there are any forward references pending
on that label.  If there are, we  %6⊗→fix-up↔←%* those instructions to
reference the location now assigned to the label.

.END
There are some restrictions which are imposed on one-pass assemblers, but
particularly for assembling compiled code, one-pass assemblers are usually 
sufficient and are quite fast.


.<<FIXUPS>>
There are at least two ways to handle fixups of forward references.  If
the fixups are of a particularly simple nature, say only requiring fixups
to the address-part of a word, then we may link those pending forward references
together, chaining them on their, as yet, un-fixed-up field.
.BEGIN GROUP

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

		        ⊂αααααπααααα⊃
		        ~     ~  ≤' ~←α⊃
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
		        ~     ~     ~  ~
		        εαααααβαααααλ  ~
			~     ~  #ααβαα$
			~     ~     ~←α⊃
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
			~     ~  #ααβαα$
			~     ~     ~←α⊃
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
pointer from  αααααα→	~     ~  #ααβαα$
entry in object list    %ααααα∀ααααα$  
.END
.begin centerit select 2;
←A Simple Fixup Scheme
.END
.END
Each time a forward reference is seen it is added to the linked list;
when the label is finally defined and given an address in memory, then the
address replaces each reference link. No extra storage is used since the
linked list is stored in the partially assembled code.

Another solution, which is potentially more general (that is, it could
handle arbitrary partial-word fixups) is to store the information
about each fixup in the symbol table under each label. 
.BEGIN GROUP

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
        from object list entry
		    ~	⊂αααααπααα⊃   ⊂αααααπααα⊃   ⊂αααααπααα⊃
		    %αα→~  #  ~ #αβαα→~  #  ~ #αβαα→~  #  ~  . . .
        ⊂ααααααααααα⊃   %ααβαα∀ααα$   %ααβαα∀ααα$   %ααβαα∀ααα$
        ~           ~      ↓             ↓             ↓
        εαααααααααααλ      ~             ~             ~
        ~           ~←ααααα$             ~             ~
        εαααααααααααλ                    ↓             ↓
        ~           ~                    ~             ~
        εαααααααααααλ   		 ~ 	       ~
        ~           ~←ααααααα←ααααααα←ααα$	       ~
        εαααααααααααλ   			       ↓
        ~           ~←ααααααα←ααααααα←ααααααααα←ααααααα$
        εαααααααααααλ   
        ~           ~   
        ε    . . .  λ
.END
.BEGIN CENTERIT; SELECT 2;
←Another Fixup Scheme
.END
.END
In this scheme we could store additional information with each link in the
list. That information could tell the fixup routine how to modify the
designated location.

.GROUP;
Both methods are useful. Both have been used extensively in assemblers and
compilers. We now sketch a simple one-pass assembler, named %3assemble%1.
To describe %3assemble%*, we will need two functions:
.BEGIN INDENT 0,13;

%21.%*  ⊗→%3deposit%*↔←%3[x;y]%*: %3x%* represents a machine address; %3y%* is a list,
representing the instruction to be deposited. %3y%*  could be a list of
 elements: %3(%*opcode, accumulator number, memory address%3)%*
The value of %3deposit%* is the value of %3y%*.
.END
.APART

.BEGIN INDENT 0,12;group;
%22.%*  ⊗→%3examine%*↔←%3[x]%*: %3x%* represents a machine address.  The value
of %3examine%* is the contents of location %3x%* in the form of a list as
specified above.
.END
We can now use our fixup mechanism, combined with
 %3examine%*, %3deposit%*, and %3putprop%* and
%3remprop%* from
{yon(P52)} to  write the parts of the assembler which worry about forward
references and labels.
We will use the second fixup scheme. If the label has been assigned a location
then the property list of the label will contain the indicator %3SYM%* and an
associated value representing the assigned location.
If the label has %2not%* been previously defined but has been referenced then
the atom will have an indicator %3UNDEF%*; the value-part will be a list
of all those locations which reference that label. Since we will only
be doing simple fixups, this will be sufficient. The contents of any location
referenced from such a fixup list will be a partially assembled word⊗↓which may
be instruction
or data← with the memory address portion set to 0. 
When the label  finally is defined we must perform the fixups, 
delete the %3UNDEF%* pair, and add a  %3SYM%* pair.
There are two main functions. 
%3defloc%* is called when a label has been defined; if there are no pending forward
references then the %3SYM%* pair is simply added, otherwise the fixup mechanism is
exercised. 
When a label is referenced, %3gval%* is called. If the label  is already defined
then it simply returns the %3SYM%* value; otherwise it adds a forward reference to the
 list. 

.BEGIN SELECT 3;GROUP;TABIT3(15,22,29);TURN OFF "←";

defloc <= λ[[lab;loc] prog[[z]
\\[null[z ← get[lab;UNDEF]] → go[a]]
\fixup\deposit[\car[z];
\\\fixit[examine[car[z]];loc]];
\\[z ← cdr[z] → go[fixup]]
\\remprop[lab;UNDEF];
\a\return[putprop[lab;loc;SYM]]
.END

.BEGIN SELECT 3;GROUP;TABIT1(13);

gval <= λ[[lab]\[get[lab;SYM];
\ %et%* → putprop[lab;cons[loc;get[lab;UNDEF]];UNDEF]; 0]]
.END

%3fixit <= λ[[x;l] mkinstr[op[x];ac[x];add[x];l]]%*

Notes: these functions use lots of tricks.
.BEGIN INDENT 10,14;GROUP;

%21.%1  In %3defloc%1 we use %3get%1 as a predicate, relying on our
convention that a non-%3NIL%1 value represents truth ({yonss(P214)}).

%22.%1  In that same conditional, we also rely on the fact that the value
of an assignment statement is the value of its right hand side. We appeal to
points %21%1 and %22%1 in the second conditional of %3defloc%1.

%23.%*  In %3gval%1, there is no e%41%*; recalling ({yonss(P214)}) that 
if p%41%* evaluates to something non-%3NIL%*, 
then that value is the value of the conditional expression.

%24.%*  We also use an extended conditional in %3gval%1, executing
the %3putprop%* and then returning %30%*.

%25.%1  Note also that %3loc%* is a non-local variable in %3gval%1.
.END


.SS(Compiling and Interpreting,,P240:)
Before adding more LISP constructs to our compiling algorithm
we should reexamine the relationships between compilers and interpreters.
The compilation of conditional expressions introduces an interesting
dichotomy between the action of an interpreter and that of a typical compiler.

We will restrict ourselves to a simple
form of the conditional expression:### %3if[p;then;otherwise]%1,## 
where %3p%1 is a an expression giving %et%1 or %ef%1;
%3then%1 is the expression to be evaluated if %3p%1 gives %et%1; otherwise
the expression, %3otherwise%1,
is  to be evaluated. It is an easy exercise to express a LISP conditional in terms
of a nested %3if%1 expression.

When an interpreter evaluates  conditional expression or
an %3if%1, it will
evaluate either %3then%1 or %3otherwise%1; not both. When a compiler compiles
code for an %3if%1 expression, it compiles %2both%1 branches.
This loss of symmetry is disconcerting. Certainly, we cannot only compile
%2one%1 branch of the %3if%1 and claim that  we have faithfully
translated a conditional; however, compiling code for program segments
which may not be executed is also disconcerting. 
For example, if a particular evaluation %2never%1 takes the %3otherwise%1 branch
of a conditional⊗↓That does %2not%1 imply that the %3otherwise%1-branch will
%2never%1 be visited.←, then we need not compile code for that branch.
At a later date, a different evaluation might take that branch, and at
that time, we should be able to compile it.
We will discuss more
implications of this behavior in {yonss(P239)} when we examine
the interactive programming aspects of LISP.
This section will deal  with the mechanical aspects  of reconciling
compilers with interpreters.

We will show that it is possible to interpret %2and%1 compile
at the same time⊗↓See ⊗↑[Mit#70]↑ for a similar idea
applied to a different language.←. The relevant observation is that much of the
compiling and interpreting algorithms are identical; they deal with
decoding the input expression and the understanding of which LISP constructs
are present. It is only after the interpreter or compiler has discovered 
the nature
of the expression that the specifics of compilation or interpretation
come into play.

We will build an evaluator/compiler named %3evcom%1 based on the
explicit access evaluator of {yonss(P209)}.
It will handle compilation and interpretation of the application
of both primitive functions and named λ-definitions; it will 
recognize the difference between local and non-local variable references,
compiling (and executing) calls on %3lookup%1 for non-local references,
and use a faster relative addressing technique for local variables.
Finally it will "incrementally compile" %3if%1 expressions,
executing (and generating code for) the branch designated by the
predicate; and will leave sufficient information available such that
if ever the other branch is executed, then code is compiled for it.



Before sketching  %3evcom%1, one other implementation detail is worth mentioning.
We cannot simply replace a LISP expression with compiled code; LISP expressions
are data structures and we must be able to operate on that data structure
representation without being aware that a compilation has occurred.
For example the LISP editor ({yonss(P247)}) must be able to manipulate
the S-expr representation.
So rather than replace expressions with code we %2associate%1 the code with the
expression using an association list whose name-entries are LISP expressions
and whose value-entries are sequences of instructions⊗↓In 
actual practice, such a  representation would be prohibitively expensive
and we would substitute a hash array technique; see {yonss(P248)}.←.
The variable %3code%1 is used to hold the association list or code
buffer.

Finally here is the sketch. We have left out many of the subsidiary functions
and have left out all of the execution mechanism involved in %3xct%1
and %3execute%1; 
%3xct%1 executes a single instruction and %3execute%1
 basically is a combined assembler and execution
device.

.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 15,21,32,40;turn off "←";

evcom <= λ[[exp]prog[[z]
\return[\[z ← hascode[exp] → execute[z];
\\ isconst[exp] → xct1[list[mksend[exp]]];
\\ isvar[exp] → \[z ← islocal[exp] → xct1[send_code[list[mklocal[z]]];
\\\ z ← isfun[exp] → send_code[evcom1[def[z];()]];
\\\ issubr[exp] → send_code[list[mkpushj[exp]]];
\\\ %et%3 → xct1[send_code[list[mkglob[exp]]]]];
\\ isif[exp] → send_code[evif[exp]];
\\ %et%3 → send_code[mkcode[\xct1[list[mkalloc[vars[fun[exp]]]]];
\\\\evcomlist[args[exp]];
\\\\xct1[list[mkcall[fun[exp]]]]] ]]] ]]

evcom1 <= λ[[exp;code] evcom[exp]]

xct1 <= λ[[x] xct[x]; x]

.END
Here's the essence of %3evcom%1: if the expression has code in the current
code buffer, then we execute it without further processing. A constant
is  executed and produces code, since that constant may be a subexpression
of a larger expression being compiled; we do not save the constant code in the
code buffer. Two types of variables are recognized: a local variable is
recognized by its presence in the local table;  a relative address reference
can be given for that entry. More will be said about rapid access to 
local variables in {yonss(P32)} and {yonss(P249)}. If the variable
reference is non-local, then we  compile a version of %3lookup%1; the actual
form of that code will depend on the binding implementation (shallow or deep).
Either  type of variable reference saves the code using %3send_code%1.

If the variable reference is to a function name, then we will have gotten
here through a function application. We must compile and execute code for that
function definition. We use the function %3evcom1%1 since the code buffer,
%3code%1, must be re-initialized within the compilation of the function body 
since code in the outer environment won't be valid within the function body.
Finally, the variable might be a reference to a primitive function, in which case
we just return the call and let the function application execute it.

The %3if%1 statement is discussed below.

If the expression  is an application, we generate (and execute) a sequence
of code to allocate space, compile and execute the argument list, and
if necessary  compile, but always execute the function call.
The code sequence is constructed using %3mkcode%1; you may think of
%3mkcode%1 as %3append%1.



.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 22,27;turn off "←";

hascode <= λ[[exp] prog[[z]
\return[\[z ← findcode[exp] → cdr[z];
\\ %et%3 → %ef%1]]]]
.END
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 15,25

evcomlist <= λ[[l]\[null[l] → ();
\ null[rest[l]] → evcom[first[l]];
\ %et%3 → mkcode[\evcom[first[l]];
\\xct1[((NEXT))];
\\evcomlist[rest[l]]]]]
.END


.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 18,27;turn off "←";

evif <= λ[[exp] prog[\[l p a b]
\l ← body[exp];
\p ← pred[l];
\a ← ante[l];
\b ← ow[l];
\p ← evcom[p];
\return[list[\[receive[] → mkifa[exp;p;evcom[a];b];
\\ %et%3 → mkifb[exp;p;a;evcom[b]]]]] ]]
.END
The compilation and execution of %3if%1 expressions is interesting.
When compiling the first reference to an %3if%1 instance, we compile
the predicate and one of the branches; we associate a structure
with the instance; that structure has either the name %3ifa%1 or %3ifb%1
depending on which branch was compiled. If we come across this instance
of %3if%1 again (either in a loop or in a recursion) then we find
the %3ifa%1 or %3ifb%1 entry in %3code%1. If we pick the same branch of the 
%3if%1 then nothing new happens; but if the (compiled) predicate
evaluated to the other truth value, then we compile the other branch and
 associate a completely compiled program with the original %3if%1 expression.
The construction of the completed code is the business of the next
function, %3mkifcode%1.
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 17,30;turn off "←";

mkifcode <= λ[[p;a;b] 
\λ[[l;l1] mkcode[\p;
\\list[mkjumpf[l]];
\\a;
\\list[mkjump[l1]];
\\list[l];
\\b;
\\list[l1]]]
\  [gensym[];gensym[]] ]
.END
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 17,32;turn off "←";


.END

.BEGIN TABIT1(16)GROUP;

%2Recognizers%3
islocal <= λ[[x]in[x;local[env]]]
isif <= λ[[x] eq[first[x] IF]]
isprim <= λ[[ins] get[ins;INST]]
isfun <= λ[[x] get[x; EXPR]]


%2Constructors%3
mklocal <= λ[[var] list[SEND@;var]]
mkglob <= λ[[x] list[LOOKUP;x]]
mkalloc <= λ[[vars] list[ALLOC;vars]]
mkcall <= λ[[fn] list[CALL;fn]]
.END

We have left out a significant amount of detail and we have only covered a 
subset of LISP, but the result should be understandable; and it
should further clarify the relationships between compilation and
interpretation. Typical discussions of compilers and interpreters
lead one to believe that there is a severe flexibility/efficiency
tradeoff imposed in dealing with compilers. If you compile programs
you must give up a lot of flexibility in editing and debugging.
With  a properly designed language and machine that is not true.


.SS(A compiler for Simple %3eval%*: The Value Stack,compiler,P32:)
We wish to add more details to our basic compiling algorithms. For simplicity,
we return to the pure compiler. We develop a more complex
compiler, leaving the  details of a complex compiler/interpreter to the reader.

The major failing of the previous %3compexp%* 
({yonss(P7)}) is its inability to handle
 variables.
A related failing is its inability to 
compile code for λ-definitions. We will alleviate both difficulties in this
section. 

From {yon(P169)}, we know what %3compexp%* will do with:

←%3f[g[A];h[B]]%*

.BEGIN GROUP
.TABIT2(10,45);SELECT 3;

\(MOVE 1 (QUOTE A))\%1; get %3A%* into %31.
\(CALL  G)\%1; call the function named %3g
\(PUSH P 1)\%1; save the value%3
\(MOVE 1 (QUOTE B))\%1; get %3B%1 into %31
\(PUSHJ P H)\%1; call %3h
\(MOVE 2 1)\%1; restore the arguments in%3
\(POP P 1)\%1;   preparation for%3
\(CALL  F)\%1;   calling %3f
.END
No suprises yet. What would we expect to see for a compiled version of:

←%3f[g[x];h[y]]%* ?

We %2should%* expect to see the same code except that we would have 
instructions to send the values of %3x%* and %3y%* into %31%* at the 
appropriate time. So the first problem is 
how to find the values of variables. 
Assume we  are really interested in compiling:

←%3j <= λ[[x;y] f[g[x];h[y]]]%*

This added problem makes our question easier. Consider a call on %3j%*: %3j[A;B]%*,
for example. We know that the execution of the call occurs after the values %3A%* and 
%3B%* have been set up in %3AC1%* and %3AC2%*.
Thus at %2that%* time we do indeed know what the
values of %3x%* and %3y%* are supposed to be. 
For sake of simplicity, assume that the variables %3x%1 and %3y%1 are
strictly local. That is,
no one within the bodies of
either %3g%* or %3h%* uses %3x%* or %*y%* free; we will worry about
compilation of free references later. 
Since they are local, then only %3j%1 needs to find their values.
We cannot leave the values in the %3AC%1s since those registers are
needed for other computations.
Rather, we will save %3x%1 and %3y%1 in the top of the stack %3P%*. 

Since %3P%1 contains the
values of partial computations, and now also contains the values of the
local λ-variables, %3P%* is also called the %2⊗→value stack↔←%*. This is a 
value stack similar to that
described in  deep-binding ({yonss(P148)}); however we do not need the name stack
here. This is because the compiler will %2know%* where on the stack values of
local variables can be found. It will put them there so it %2should%* know.
This lack of a name stack is a mixed blessing; we save space, but
we have lost the names, which
are useful if we are debugging code.  
Note %3P%1 is not solely a value stack; it also contains the control information.
We are not always able to mix access and control information on one stack;
in fact, we know that a stack is not always a sufficent  vehicle for
describing  LISP's access requirements.  However, a very large subset
of LISP %2does%1 allow a single-stack implementation, and we will be
compiling within that subset for most of this chapter.

Addressing the task at hand, the instructions for the body of %3j%1
will be very similar to those displayed for %3f[g[A];h[B]]%1.
 We will generate instructions to save the values on the actual parameters by
prefixing the %3compexp%*-code with two %3PUSH%* operations:
.BEGIN TABIT1(34);SELECT 3;GROUP

\(PUSH P 1)
\(PUSH P 2)
.END
.P163:
After execution of this pair of instructions,
called the %2prolog%1, the value of %3y%* is on the
top of the stack, and the value of %3x%* is the next element down⊗↓The observant
reader will note that the  %3PUSH%*  for %3x%* is unnecessary. 
Since we have assumed that %3x%* and %3y%* are strictly local, and since no one
else needs the value of %3x%* except for %3g[x]%*, we can simply compute 
%3g[x]%* directly. One might also think that we could leave
%3B%*  in %3AC2%* while we calculated %3g[x]%*; we cannot do that, as %3g[x]%*
might  use %3AC2%*. We must %3PUSH%* %3y%*.←.

Now that we have saved the values, we need instructions to %3send%1 them to
%3AC1%* when the value is needed. We will implement %3send@%1 using
the %3MOVE%* instruction ({yonss(P33)}). In this case
our memory reference will be relative to the top of the stack %3P%*.
Relative addressing is described to our machine
with a memory field of the form "%3n#P%*", where %3n%* designates the offset into 
%3P%* and references the %3n%8th%1 element, counting 
backwards from zero. Thus in our
current example "%30#P%*" refers to the value for %3y%* and "%3-1#P%*" refers to
%3x%* at the time %3j%* is entered.
Be sure to realize also that our addressing is relative; though %3"0#P"%* refers
to %3y%* at entry time, %30#P%* will %2not%* refer to %3y%* when we have pushed
something onto the stack.
Be sure to realize that we %2cannot%* change our relative addressing to hard machine 
locations in the assembler. The addressing must always be relative. We will be 
compiling code for recursive functions. Each recursive call must get a fresh
segment of the value stack in which to store its results. A similar problem appeared
when we examined the %3CALL-RET%* mechanism on {yon(P167)}. There we were dealing
with control information stored on a stack.

Finally, we cannot leave the code for %3j%* as it stands. If the prolog pushes
two entries onto the stack then we had better construct an epilog to remove them;
otherwise the stack will not be 
in the  state expected by the calling program. As we leave %3j%* we subtract
%32%* from the pointer %3P%* to synchronize the stack.
The constant %32%1 is designated as %3(C#2)%1.
Finally we exit via %3RET%*. 

One further embellishment is needed: since we are defining a function and
turning it into compiled code, we must preface the code sequence with information to
our assembler to designate %3j%* as a %3SUBR%*. The assembler will  make a new
property-value pair consisting of the property name %3SUBR%1 and  an associated 
property value indicating the  location in BPS which contains the beginning of
the code for the  procedure. That pair is  placed on the p-list of the 
atom representing the function name.
.BEGIN TABIT2(10,45);SELECT 3;GROUP;
.P165:

\(LAP J SUBR)\%1; says %3j%* is a function%3
\(PUSH P 1)\%1; save the input args%3
\(PUSH P 2)
\(MOVE 1 -1 P)\%1; get %3x
\(CALL G)\%1; call the function named %3g
\(PUSH P 1)\%1; save the value%3
\(MOVE 1 -1 P)\%1; get %3y
\(PUSHJ P H)\%1; call %3h
\(MOVE 2 1)\%1; set up the arguments in%3
\(POP P 1)\%1;   preparation for%3
\(CALL  F)\%1;   calling %3f
\(SUB P (C 2))\%1; synchronize the stack by removing  
\\;     the two saved args.%3
\(RET)\%1; exit with %3AC1%* containing the value of %3j[x;y].
.END
As you read the code and as you study its execution you should remember that
the addressing in the code is relative to the top of the stack: %3(MOVE#1#-1#P)%*
gets us %3x%* in one instance and gets us %3y%* in another, because the top 
of the stack changes.
Here is a picture of the execution of the code:

.BEGIN TURN ON "\";NOFILL;TABS 5,10,25,30,45,50,65,70;SELECT 3;GROUP;

AC1: x ; AC2: y\\\AC1: x ; AC2: y\AC1: x ; AC2: y
\ \  (PUSH P 1)\\ (PUSH P 2)\|  y\|(MOVE 1 -1 P)\| y | (CALL G)
\|\|    =>\|  x\|     =>\|   x\|        =>\| x |  =>


AC1: g[x] ; AC2: ?\\\AC1: y ; AC2: ?
\\ (PUSH P 1)\|g[x]\|(MOVE 1 -1 P)\|g[x]\| (PUSHJ P H)
\|  y\|      =>\|  y\|      =>\|  y\|   =>
\|  x\|\|  x\|\|  x\|


AC1: h[y] ; AC2: ?\AC1: h[y] ; AC2: h[y]\AC1: g[x] ; AC2: h[y]
\|g[x]\| (MOVE 2 1)\|g[x]\|  (POP P AC1)\| y |(CALL 2 (E F))
\|  y\|       =>\|  y\|      =>\| x |   =>
\|  x\|\|  x\|
\    \ 


AC1: f[g[x];h[y]]
\|  y\|(SUB P (C 2))\\\ \  (RET) %1return to caller.%*
\|  x\|           =>    \\=>\|\|


.END


.SS(A Compiler for Simple %3eval%*,compiler,P249:)
.BEGIN TURN ON "#";
Now that we know what the runtime code for local variable  references %2could%* be,
the  point now is  to get a compiler 
to generate code which `knows'
where on  the stack we can find the value  of a local variable when
we execute that code.  What  we shall do is simulate the behavior  of
the runtime  stack while  we are  compiling the  code.   The compiler
cannot  know what the %2values%* of the  variables will be at run time but
it can know %3where%* to find the values.  We will carry
this information through the  compiler in a manner reminiscent of the
`association#list' symbol table of the %3eval%* introduced in {yonss(P6)}.  
Instead  of
posting the current values  in the stack, the compiler  will post information about  the
positions  of the variables relative  to the top of  the stack at the
time we enter  the function.   The variable-position list, %3vpl%*, 
contains this information.  If we are
to compile  a function with λ-variables, %3[x;y;z]%* then %3vpl%* will begin
with:

←%3((X . 1), (Y . 2), (Z . 3) ...%*

When we set up %3vpl%* 
we also set the %2⊗→offset↔←%*,  called %3off%*, to minus the number  of 
arguments added
to %3vpl%*, in this  case -3.  Now if we come across a reference, say to
%3Y%*, while compiling code, we  use %3cdr[assoc[Y;vpl]]%* to retrieve %32%*.   The
offset plus this retrieved value gives  us the relative position of %3Y%*
in  the stack:#-3#+#2#=#-1.   Thus  to refer  to the 
location of %3Y%*  we  use %3(...#-1#P)%*.  

What  happens as we  add
elements to  the stack?  Or to  be more precise, what  happens as we
generate code which when  executed will add  elements to the  stack?
Clearly we must modify the  offset.  If we add one  element, we would
set  %3off%* to -4.   Then to reference  %3Y%* now use  -4 + 2  = -2; that is, a
reference to %3Y%* is now of the form:

←%3( ......-2 P).%*

But that's right.  %3Y%*  is now further down in the run time  stack.  Thus
 the  `symbol table' is really  defined by %3off%*
plus the  current  %3vpl%*. Here's  a sketch of the proposed %3compexp%*
in its performance of local variable recognition.
.BEGIN ;GROUP;SELECT 3;CENTERIT;

←islocalvar[exp] → list[mkvar[1;loc[exp;off;vpl]]]
%1where:%3
←loc  <= λ[[x;off;vpl] plus[off;cdr[assoc[x;vpl]]]]
%1and,%3
←mkvar <= λ[[ac;mem] list[MOVE;ac;mem;P]]
.END

Next, when will the compiler make modifications  to the top  of the stack? 
We push new  elements when we are compiling  the arguments to a function
call. We know that %3complis%* is the function  which compiles the  argument list.
Thus our new %3complis%* had better know about %3off%* and %3vpl%*, and
since %3complis%* changes the state of the stack, then it had better
change %3off%* appropriately:

.BEGIN SELECT 3; TABIT2(22,32);GROUP;CENTERIT;

complis <= λ[[u;off;vpl]\[null [u] → ( );
\ null[rest[u]] → compexp[first[u];off; vpl];
\ %Et%* → append\[compexp [first[u]; off; vpl];
\\ list[mkalloc[1]];
\\ complis [rest[u]; sub1[off]; vpl]]]]

.END

Notice  that   %3complis%*  compiles  the   arguments  from  left   to  right,
following each with %3(PUSH#P#1)%* and recurring with  a new offset
reflecting the effect of the %3PUSH%*. This function is analogous to
%3evlis%*.

.GROUP;
Here's a brief description of the parts of the new compiler⊗↓This compiler was 
adapted from one written
by J.#McCarthy (⊗↑[McC#76]↑), and  proved correct by R.#London#(⊗↑[Lon#71]↑)
and M.#Newey (⊗↑[New#75]↑).←.

.BEGIN INDENT 0,17;
%3compile[fn;vars;exp]: fn%* is  the name of  the function to be  compiled. %3vars%*  is the
list of lambda variables.  %3exp%* is the lambda-body.

%3prup[vars;n]:  vars%* is  a  lambda list, %3n%* is an  integer.   %3prup%*  builds  a
variable-position list.

.END
.APART
.BEGIN INDENT 0,19;GROUP;

%3compexp[exp;off;vpl]%*: This function generates the code for constants and
for references to  variables.
If the variable is local, a simple %3send%1 is generated, otherwise
a call on %3lookup%1 results.
If a conditional expression is recognized, %3comcond%1 is called to 
produce the code.
If %3exp%1 does not fit one of these categories, it is assumed to be
an application of a call-by-value function.
In this case, %3complis%* compiles the argument list,
leaving the arguments in the stack; %3loadac%* loads the  appropriate %3AC%*'s.
and then we generate
a call on the function, and finally generate
the %3SUB%*  to synchronize the stack.

.END
.BEGIN INDENT 0,20;GROUP;
%3comcond[u;glob;off;vpl]%*: this compiles the body of conditional expressions.  %3u%* is the
p%4i%*#-#e%4i%*  list; %3glob%* will be  bound to a  generated symbol name;
%3off%* and %3vpl%* will always be the offset and the variable-position list.
.END
.END

Fortified by the previous %3compile%* functions and this introduction
the new %3compile%* should be clear.

.BEGIN TABIT2(12,24);select 3;TURN OFF "←";GROUP

compile <= λ[\[fn;vars;exp]
\λ[[n] append[\mkprolog[fn;n];
\\compexp[exp; -n; prup[vars;1]];
\\mkepilog[n]]]
\ [length[vars]] ]

.END
.BEGIN TABIT1(18);select 3;TURN OFF "←";GROUP;

mkprolog <= λ[[f;n] concat[list[LAP;f;SUBR];mkpushs[n;1]]]

mkpushs <= λ[[n;m][\lessp[n;m] → ( );
\%Et%3 → concat[mkalloc[m]; mkpushs[n;add1[m]]]]]

mkepilog <= λ[[n] list[mksync[n];mkret[]]]

mksync <=λ[[n] list[SUB;P;list[C;n]]]

mkret <=λ[[] (RET)]
.END

.BEGIN TABIT1(17);select 3;TURN OFF "←";GROUP;

prup <= λ[[vars;n][\null[vars] → ( );
\%et%3 → concat[cons[first[vars]; n];prup[rest[vars];add1[n]]]]]


.END
.BEGIN TABIT3(23,39,53);select 3;TURN OFF "←";GROUP;

compexp <= λ[[exp;off;vpl]\[isconst[exp] → list[mkconst[1;exp]];
\ islocalvar[exp] → list [mkvar[1;loc[exp;off;vpl]]];
\ isnonlocal[exp] → list[mklookup[exp]];
\ iscond[exp] → comcond[args%4c%3[exp];gensym[ ];off; vpl];
\ isfun+args[exp] →\λ[[z] compapply[\func[exp];
\\\complis[z;off;vpl];
\\\length[z]] 
\\  [arglist[exp]] ]]

.END
%3compapply%1 is found on {yon(P82)}.

.BEGIN TABIT2(25,35);select 3;TURN OFF "←";GROUP;

comcond <= λ[[u;glob;off;vpl][\null[u] → list[mkerror[ ];glob];
\%Et%3 → append[\comclause[first[u]; gensym[];glob;off;vpl];
\\comcond[rest[u]; glob;off;vpl] ]]

.END

.BEGIN TABIT1(34);select 3;TURN OFF "←";GROUP;

comclause <=λ[[p;loc;glob;off;vpl]append[\compexp[ante[p];off;vpl];
\list[mkjumpf[loc]];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob];loc]]]

.END


Here is a partial sketch of %3compile%* operating on %3j <= λ[[x;y]f[g[x];h[y]]]%*.
Compare the code it generates with the code we saw on {yon(P165)}.

.BEGIN SELECT 3;TABIT2(10,17);CENTERIT;

.GROUP
compile[J;(X Y);(F (G X) (H Y))] 

%1gives:%*
\append\[((LAP J SUBR));      
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ compexp[(F (G X) (H Y));-2;prup[(X Y);1]];
\\ ((SUB P (C 2))
\\  (RET))] %1
.APART

.GROUP
where:
←%3prup[(X Y);1]%*       gives        %3((X . 1) (Y . 2))%*.
.APART
.GROUP

%3compexp[(F (G X) (H Y));-2;((X . 1) (Y . 2))]%* 

results in:
%3
\append\[complis[((G X) (H Y));-2;((X . 1) (Y . 2))];
\\ mklink[2];
\\ ((CALL F))]

%1and %3mklink[2]%* evaluates to: %3((MOVE 2 1) (POP P 1))%*.
.APART
.GROUP

Thus the code we're getting looks like:

%3
\\((LAP J SUBR)
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ complis[((G X) (H Y));#-2;#((X . 1) (Y . 2))]
\\ (MOVE 2 1)
\\ (POP P 1)
\\ (CALL  F)
\\ (SUB P (C 2))
\\ (RET) )

.APART
.FILL;
%3complis%1 is interesting since it actually uses the %3vpl%* we have been
carrying along. It gives rise to:
.NOFILL
.GROUP

%3
\append\[compexp[(G X);-2;((X . 1) (Y . 2))];
\\ ((PUSH P 1));
\\ complis[((H Y));-3;((X . 1) (Y . 2))]]

.APART
.GROUP
%1and the %3compexp%* computation involves, in part:

%3
\append[complis[(X);-2;((X . 1) (Y . 2))];
\\ ((CALL G))]

.APART
.GROUP
%1Finally this %3complis%* generates the long awaited variable reference using:

%3compexp[X;-2;((X . 1) (Y . 2))] %1giving,##%3((MOVE 1 -1 P))%*.
.APART
.GROUP;
So our code is:
%3
\\((LAP J SUBR)
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ (MOVE 1 -1 P)
\\ (CALL G)
\\ (PUSH P 1)
\\ complis[((H Y));#-3;#((X . 1) (Y . 2))]
\\ (MOVE 2 1)
\\ (POP P 1)
\\ (CALL  F)
\\ (SUB P (C 2))
\\ (RET) )%1


Notice that the offset is different within the call:

←%3 complis[((H Y));-3;((X . 1) (Y . 2))].%1

But that is as it should be: there is an  extra value on the stack now.
.END
.APART

.CENT(Problems)

%21.%1 Complete the code generation for the above example.

.P36:
%22.%* Extend the compiling algorithm to recognize anonymous λ-expressions.

.SS(Efficient Compilation)
.P35:
We have discussed compilation  at two  different levels: we can
translate LISP expressions into sequences of the LISP control primitives
of {yonss(P230)}; or we can translate into the instructions of the
%aSM%1 machine of {yonss(P33)}. We conceptualized the compilers in  terms of
 higher level, but  biased many of our choices towards
 implementations in terms of the %aSM%1 instruction
set. Our choices influenced the efficiency of the resulting compiler.

We should first clarify what we mean by efficiency in this context.
If the compiler produces code for the LISP primitives and then we encode the LISP
primitives in terms of the %aSM%1 instruction set, then we get a simple compiler
which tends to produce inefficient code; inefficent, in terms of the %aSM%1 machine,
not in terms of the LISP primitives. Such a compiler would be  efficient
in terms of compilation time and
might suffice for debugging runs or student projects.

More likely, efficient compilation is taken to mean production of
code which we could expect from a reasonably bright machine-language programmer.
It should run reasonably fast, not have obviously redundant instructions, and
not take too much space in the machine. It is this second interpretation
of efficiency which we shall use. In this interpretation we 
don't simply implement the LISP primitives, but take a more global view
of the underlying machine. We take advantage of more of the hardware
features,  incorporating them deeper into the structure of the compiler.
This process is called optimization.
Optimization defies the mismatch between  the programming language
and the hardware machine.
The result is  a compiler which  is much more machine dependent,
requires more processing time, but  produces much better code for
that specific machine.

Examination of the compilation of even the
most simple function suggests many possible improvements, 
given the %aSM%1 machine.
A major inefficiency occurs in saving and restoring  quantities on the
stack. This is a symptom of a more serious disease: 
the compiler does not remember what will be in the ACs at run-time. Since 
we are assuming that the
arguments to a function call are to be passed through the %3AC%1's,
and since it is expensive to save and restore these registers, we should
make a concerted effort to remember what quantities are in which ACs and
not reload them unnecessarily. But note that this optimization is dependent
on the hardware of our machine; if we had only one %3AC%1, the trick would
not be applicable. 

.SS(Efficiency: Primitive Operations,,P166:)
First we should be able to generate references to constants into %3AC%*'s
other that %3AC1%*. 
.GROUP;
For example, the call on %3f[1;A]%* should be generated
as:
.BEGIN GROUP;TABIT1(33);SELECT 3;

\(MOVEI 1 1)
\(MOVEI 2 (QUOTE A))
\(CALL  F)
.END
There is  no reason to  save  constants in the stack.
.APART

We should also expect that the LISP primitive operations, %3car, cdr, cons, eq,%*
and %3atom%* should occur rather frequently in compiled code; and we
should expect that a reasonable compiler be cognizant of 
their existence and compile more efficient code for their execution.
In this section we will enlarge the instruction set of our machine, adding
plausible operations for some of these primitives⊗↓Some of these instuctions
exist on the PDP-10. HLRZ and HRRZ are used for %3car%1 and %3cdr%1,
respectively; and the version of the PDP-6
which was delivered to Stanford had a hardware %3cons%1 operation.←. 
In the description of these new instructions
%3ac%* will always refer to an %3AC%*-register; %3loc%* will be either
an %3AC%* or a memory location, and %3mem%* will be reserved for memory
references only.

%3CAR%* is an instruction, taking two arguments:
an %3ac%* and a %3loc%* 
respectively. The %3car%* operation is performed from %3loc%* to %3ac%1.
For example when compiling the call, %3f[1;car[x]]%*,
we want to get %3car[x]%* in %3AC2%*. If %3x%* were in -%35#P%* then we could
accomplish our loading directly by 
%3(CAR 2 -5 P)%1 instead of:
.BEGIN TABIT1(33);SELECT 3;GROUP;

\(MOVE 1 -5 P)
\(CALL  CAR)
\(MOVE 2 1)
.END

We can also exploit the fact that the second argument to %3CAR%* is a %3loc%*:
the second argument to %3f[1;car[car[x]]]%* could have been compiled as:
.BEGIN TABIT1(33);GROUP;SELECT 3;
\(CAR 2 -5 P)
\(CAR 2 2)
.END

We will assume the existence of an analogous %3CDR%* instruction. With
these two instructions we can significantly improve the code for
%3car-cdr%*-chains.

Another source of  efficiency is available to us. Consider the clause:
.BEGIN CENTER;SELECT 3;

[eq[x;A] → B; ...]
.END
.BEGIN GROUP;
Assuming that %3x%* were on the top of the stack, our current compiler
would generate:
.TABIT1(33);SELECT 3;

\    (MOVE  1 0 P)
\    (MOVEI 2 (QUOTE A))
\    (CALL EQ)
\    (JUMPF 1 L1)
\    (MOVEI 1 (QUOTE B))
\    (JUMP LOUT)
\L1 ...
.END
The use of predicates in this context does not require 
construction of the constants %et%* and %ef%*. All we need to do is implement
the %3eq%* test as a jump to one of two locations.

We will introduce an instruction %3CAME%* taking two arguments;
first, an %3ac%* and the second, a %3loc%*. %3CAME%* compares
the contents of  the two arguments, and if they are equal, it skips
the next instruction. 

.BEGIN TABIT1(33);GROUP;
Thus the above example could be compiled as:
.SELECT 3;
\    (MOVEI 1 (QUOTE A))
\    (CAME 1 0 P)
\    (JUMP L1)
\    (MOVEI 1 (QUOTE B))
\    (JUMP LOUT)
\L1 ...
.END
Notice that we have added an extra piece of knowledge to the compiler;
it knows that %3eq%* is commutative in this instance⊗↓If there are 
side-effects in the computation of the arguments, the
order can make a difference. However unless explicitly stated our compilers do
not have to consider side-effects.←. 
We still require some artifacts in the compiler to generate full procedure
calls on
predicates particularly since predicates may  return  values
other than %et%1 and %ef%1. But in many instances,
particularly within %3compcond%*, we can expect to generate tighter code.

.SS(Efficiency: Calling Sequences)
We want to integrate the new compiling techniques into our
compiler. Since LISP depends heavily on procedure calls, the
computation of parameter lists and procedure calls is an area
of great concern to the  designer of a LISP compiler.

Here is the  code which the current compiler will produce for the expression
%3f[1;g[3];#car[x]]%*:

.BEGIN tabit1(33);SELECT 3;GROUP

\(MOVEI 1 1)
\(PUSH P 1)
\(MOVEI 1 3)
\(CALL G)
\(PUSH P 1)
\(MOVE 1 -2 P)
\(CALL CAR)
\(MOVE 3 1)
\(POP P 2)
\(POP P 1)
\(CALL F)
.END

By way of motivation and introduction, here is what our next compiler does for the
same call:

.BEGIN TABIT1(33);SELECT 3;GROUP;

\(MOVEI 1 3)
\(CALL G)
\(MOVE 2 1)
\(CAR 3 0 P)
\(MOVEI 1 1)
\(CALL F)
.END

Examination of the code shows the results of several optimization techniques.
We are using  the %3CAR%* instruction of the last section.
We are also doing operations into %3AC%*'s other than %3AC1%*. This
allows us to remove some of the obnoxious %3PUSH-POP%* sequences.

The major modification involves an analysis of the arguments being compiled for
a function call. 
Much of LISP's activity involves function calls. Much of the current 
compiler's inefficiency involves generation of arguments to those calls.
This is a bad combination.
Thus we should concentrate some effort on this area of the compiler.
That part is  %3complis%*.
Within our new %3complis%* we will
divide the arguments into two classes:#trivial and complex.
Since most of our worry is about the optimization of the %3AC%1's,
we will make %3complis%1 the major state of the compiler. We can define
%3compexp%1 as:

.BEGIN SELECT 3;CENTER

compexp <= λ[[exp;vpl;off] complis[list[exp];vpl;off]]
.END
%3complis%1 is the natural place to deal with register allocation since
it is responsible for the compilation of the actual parameters. The alternative
would be to pass the %3AC%1 destination to %3compexp%1. That scheme
becomes quite complex if dealt with consistently. So %3complis%1 becomes
the kernel function and must examine each argument to a function call.

Trivial arguments are those which need make no demands on the runtime stack;
the computation  they entail can all be done in the %3AC%* registers. Thus
the code that the compiler generates need not involve %3PUSH-POP%* sequences.
For example, references to constants need not be generated and then pushed
onto the stack; we can compile the other arguments first and then, just before
we call the function, load the appropriate %3AC%*  with that
constant. A similar argument can be used for postponing the loading of
variable references⊗↓But note that the argument for variables is shaky;
if our compiler handled programs with side-effects then we could %2not%*
be sure that the postponed value would be the same as that gotten if
we had loaded it at the "proper" time.←. The third trivial construct for this
%3complis%* is the handling of %3car-cdr%* chains. We will use our augmented
instruction set to perform computation of %3car%*s and %3cdr%*s directly
to a specified %3AC%*.
Complex arguments are those which require some non-trivial computation; 
each non-trivial
computation will be prefaced with a %3PUSH%1 to save the current contents of 
%3AC1%1.

Besides the compilation of efficient
code we would also like to make the compiler efficient. We would like to make the
compiling process as one-pass as possible. Our basic tasks in the new %3complis%*
are classification of the arguments and compilation of the code. With a little
care we can do both at the same time. There is nothing problematic about
the compilation of the trivial code⊗↓That's why it's trivial!←. We thus
turn to the complex code. 

The old %3complis%* generated a block <code %3arg%4i%1>-%3PUSH%1 on each
cycle. That code was followed by a %3MOVE%1 to move the last value
from %3AC1%1 to %3ACn%1. In the previous compiler %3compexp%1 was the major 
function; it handled  the bulk of the code generation. Here %3complis%1 will be the
major function. The old %3complis%1 had three states: empty argument list,
singleton argument list, and otherwise condition. The new %3complis%1 has two states;
this is done to make %3complis%1 shorter.
On each cycle through %3complis%1 we generate a
%3PUSH%*-<code#%3arg%4i%1> sequence. Now we have a spurious
%3PUSH%* on the %2front%* of the sequence; one %3rest%* will take care of %2that%*.

We must also  generate a list of %3POP%*s to suffix to  the complex code
to get the saved values back into the proper %3AC%*'s: one pop for each argument.
The %2last%* %3POP%* should be modified  to be a %3MOVE%1 since
we have not generated the corresponding %3PUSH%*. The memory
field of the last %3POP%* has the needed information; it  tells us where
the %3MOVE%* we want to make should go: 
.BEGIN CENTER;SELECT 3;

(POP P N) =>  (MOVE N 1)
.END
.GROUP
This modified list of %3POP%*s is added to the code sequence, followed by any
trivial code which we may have
generated. Note that  this reordering is strictly an efficiency consideration
under the assumption that the %3AC%1's are being used to
simulate a temporary %3dest%1 block, which will
immediately become a block of local bindings, and which are subject to
local use only. 
With this introduction, here is %3complis%* and
friends:


%3complis <= λ[[u;off;vpl] complis%9'%*[u;off;off;vpl;();();();1]%1
.APART
.BEGIN turn on "\"; no fill;TABs 12,20,30;SELECT 3;TURN OFF "←";
.GROUP

complis%9'%* <= λ[[u;org;off;vpl;triv;cmplx;pop;ac]
\[null[u] →\[null[cmplx] → triv;
\\ %et%* → append\[rest[cmplx]
\\\ list[mkmove[mem[first[pop]];1]];
\\\ rest[pop];
\\\ triv]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,35,63;SELECT 3;TURN OFF "←";
.GROUP
\ isconst[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\concat[mkconst[ac;first[u]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,33,63;SELECT 3;TURN OFF "←";
.GROUP
\ isvar[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\concat[mkvar[ac;loc[first[u];off;vpl]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,36,43;SELECT 3;TURN OFF "←";
.GROUP
\ iscarcdr[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\append[\reverse[compcarcdr[ac;first[u];off;vpl]];
\\\\triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,34,41,47,54;SELECT 3;TURN OFF "←";
.GROUP
\ iscond[first[u] → complis%9'%*[\rest[u];
\\\org;
\\\sub1[off];
\\\vpl;
\\\triv;
\\\append[\cmplx;
\\\\concat[\mkpush[1];
\\\\\comcond[\args%4c%3[first[u]];
\\\\\\gensym[];
\\\\\\off;
\\\\\\vpl]]];
\\\concat[mkpop[ac];pop];
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,30,36,50,57;SELECT 3;TURN OFF "←";
.GROUP
\ %et%* → complis%9'%*[\rest[u];
\\org;
\\sub1[off];
\\vpl;
\\triv;
\\append[\cmplx;
\\\concat[\mkpush[1];
\\\\λ[[z] compapply[\func[first[u]];
\\\\\complis[\z;
\\\\\\off;
\\\\\\vpl];
\\\\\length[z]]]
\\\\ [arglist[first[u]] ]];
\\concat[mkpop[ac];pop];
\\add1[ac]]]
.END

.BEGIN CENTERIT;SELECT 3;

mkmove <= λ[[ac;loc][eq[ac;loc] → (); %et%* → list[MOVE;ac;loc]]]
.END

.BEGIN TABS 6,15,24,42;NOFILL;SELECT 3;TURN OFF "←";GROUP;TURN ON"\";

compcarcdr <= λ[[ac;exp;off;vpl]
\\[isvar[arg[exp]] → list[mkcarcdr[\func[exp];
\\\\ac;
\\\\loc[arg[exp];off;vpl]]]
\\%Et%* → concat[\mkcarcdr_ac[func[exp];ac;ac];
\\\compcarcdr[ac;second[exp];off;vpl]]]]
.APART
.GROUP

iscarcdr <=λ[[u]\[iscar[u] →iscarcdr[arg[u]]
\\ iscdr[u] →iscarcdr[arg[u]]
\\ atom[u] → %et%*
\\ %et%* → %ef%* ]]

iscar <= λ[[x] eq[func[x];CAR]]

iscdr <= λ[[x] eq[func[x];CDR]]

mkcarcdr <=λ[[carcdr;ac;loc] concat[carcdr;concat[ac;loc]]]
.END

.SS(Efficiency: Predicates)
We have already noted in {yonss(P166)} that some efficiencies are possible in
the handling of predicates inside of conditional expressions. Here we will examine
more possibilities. The first point of contention is that the current
%3compclause%* is %2not%* good enough. We want to be able to use the Boolean
special forms: %3and[u%41%*;#...;u%4n%*]%1 and
%3or[u%41%*;#...;u%4n%*]%1. The definition of these constructs required
they not evaluate any more arguments than necessary.
We can use this property when using %3and%1 and %3or%1 as predicates in 
conditional expressions.
We will add recognizers for %3and%* and %3or%* inside
%3compclause%* and will add a new section to the compiler to deal with 
their compilation.

First, here is the structure of typical code sequences:

.BEGIN TABIT2(10,40);SELECT 3;GROUP;
\and[u%41%*; ... u%4n%*] → e;\or[u%41%*; ... u%4n%*] → e;
\ %1gives: \gives:%3

\   <code for u%41%*>\   <code for u%41%*>
\   (JUMPF 1 lint)\   (JUMPT 1 loc)
\   <code for u%42%*>\   <code for u%42%*>
\   (JUMPF 1 lint)\   (JUMPT 1 loc)
\    . . .\          . . .
\   <code for u%4n%*>\   <code for u%4n%*>
\   (JUMPF 1 lint)\   (JUMPT 1 loc)
\   (JUMP loc)    \   (JUMP lint)
\loc\loc
\    %1<code for %3e%*>\    <code for %3e%*>%3
\   (JUMP lout)\   (JUMP lout)
\lint\lint
.END
The label %3lint%1 indicates the next clause in the conditional
expression. Note the symmetry between the code for %3and%1 and the
code for %3or%1. There is a slight inefficiency in %3and%1 with
%3(JUMP#loc)%1 immedately followed by %3loc%1, but we can easily remove that.

.BEGIN GROUP;TABIT1(36);

Here is a %3compclause%* which will generate it:

.select 3;

compclause <=λ[[p;loc;glob;off;vpl] append[\compred[ante[p];loc;off;vpl];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob];loc]]]

.END
.BEGIN TABIT3(25,44,48);GROUP;SELECT 3;

compred <= λ[[p;lint;off;vpl]\[isand[p] → compandor[\args[p];
\\off;
\\vpl;
\\list[mkjumpnil[lint]];
\\()];
\ isor[p] → λ[[loc] compandor[\args[p];
\\\off;
\\\vpl;
\\\list[mkjumpt[loc]];
\\\list[mkjmp[lint];loc]]][gensym[]];
\ %et%* → append[compexp[p;off;vpl];list[mkjumpf[lint]]]  ]]
.END
.BEGIN TABIT2(30,41);GROUP;SELECT 3;

compandor <=λ[[u;off;vpl;inst;fini]\[null[u] → fini;
\ %et%* → append[\compexp[first[u];off;vpl];
\\inst;
\\compandor[rest[u];off;vpl;inst;fini]] ]]]
.END

.CENT(Problems)
%2I%* We should recognize the construct %et%*#→%3e%4i%1 in conditional expressions
and compile special code for it. We should also realize that in the construct:
.BEGIN CENTER;SELECT 3;
[p%41%* → e%41%* ... %et%* → e%4i%*; ...p%4n%* → e%4n%*]
.END
we can %2never%* reach any part of the conditional after the %et%*-predicate.
Therefore no code should be generated. Rewrite the compiler to handle these
additional observations about conditionals. 

The second point, above, is a special instance of a general  compiling question.
How clever should the compiler be? If it can recognize that a piece of program
can never be reached, should it tell the user or should it compile minimal code?

%2II%* Write a new %3compile%* including all the efficiency considerations
discussed so far.

%2III%1 When we apply the convention that anything non-%3NIL%1 is a representation
of truth, it is often convenient to evaluate %3and%1 and %3or%1 for "value".
That is their value is either %3NIL%1 or non-%3NIL%1. Extend our compiler
to handle such uses of these functions.

%2IV%1 Extend the compiler to compile efficient code for compositions
of the predicates %3and%1, %3or%1, and %3not%1.
.SS(A Compiler for %3progs%*)
The compiler of this section will not compile all %3prog%*s; it is only
intended to demonstrate some of the salient features of a %3prog%* compiler.
They are:
.BEGIN INDENT 0,5,5;
.GROUP
%21.%* Handling of assignments. Since we are assuming local variables,
then storage to the value stack is sufficient.
.APART
.GROUP
%22.%* The %3go%*-label pair. We will assume that this can be passed off to the
assembler.
.apart
.group
%23.%*  On leaving a %3prog%*-body we have to remove the %3prog%*-variables
from the top of the stack. This is done by comparing the current %3off%*
with %3vpl%*.
.END

.BEGIN SELECT 3;GROUP;nofill;TABIT3(15,26,34);

compprog <=λ[[locals;body;off;vpl]
\λ[[n]append[\mkpushlistnil[n];
\\compbody[\body;
\\\labels[body];
\\\difference[off;n];
\\\pruploc[locals;-off;vpl];
\\\n;
\\\gensym[]]]
\   [length[locals]]
.END
.BEGIN SELECT 3;GROUP;nofill;TABIT2(24,35);

pruploc <= λ[[locals;off;vpl]\[null[locals] → vpl;
\ %et%* → pruploc[\rest[locals];
\\add1[off];
\\concat[cons[first[locals];off];vpl]]]]

.END
.BEGIN SELECT 3;GROUP;TABIT1(18);

labels <= λ[[body]\[null[body] → ();
\ islabel[first[body]] → concat[first[body];labels[rest[body]]];
\ %et%3 → labels[rest[body]]]]
.END
.BEGIN SELECT 3;GROUP;nofill;TABs 15,26,39,48,51,55; turn on "\";

compbody <= λ[[body;labels;off;vpl;n;exit]
\[null[body] →list[mkconst[1;NIL];exit;mksync[n]];

\ islabel[first[body]] → concat[first[body];compbody[\rest[body];
\\\\\\labels;
\\\\\\off;
\\\\\\vpl;
\\\\\\n;
\\\\\\exit]];
\ isgo[first[body]]  →  append[\list[compgo[arg[first[body]];labels]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ isret[first[body]] →  append[\compexp[arglist[first[body]];off;vpl];
\\\sync[off;vpl];
\\\list[mkjump[exit]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ issetq[first[body]] → append[\compexp[rhs[first[body]];off;vpl];
\\\list[mkmovem[\1;
\\\\\loc[lhs[first[body]];off;vpl]]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ iscond[first[body]] → append[\compcondprog[arg[first[body]]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\%et%* → append[\compexp[first[body];off;vpl];
\\compbody[rest[body];off;vpl;n;exit]]]]
.END


.BEGIN SELECT 3;GROUP;

compgo <= λ[[x;l][member[x;l] → mkjump[x]; %et%3 → err[UNDEFINED_TAG]]];
.END

This %3compprog%1 only handles a subset of the semantics of %3prog%1.
We do not handle any non-local jumps; a new list of labels is made up on 
entry to a %3prog%1 and only that set of labels is accessible for %3go%1s.
As a further  restriction,
we also assume that the %3prog%1 variables are used in a strictly local
fashion. 


.CENT(Problem)
Write %3compcondprog%*.
.SS(Further Optimizations,,P258:)
This section  is in the nature of hints and possibilities  for expansion of the
basic compiling algorithms.

One of the first things to note about the compiling algorithm is its lack of 
knowledge about what
it has in the various %3AC%*'s. Frequently  the
compiled code will load up one of the registers with a quantity that is already 
there. Thus the first suggestion: build a list of 
what's in various registers. We know what's there when we enter the function;
whenever we perform an operation which destroys a register then we have to 
update the compiler's memory. Whenever we need a quantity, we check the
memory. If the object is already in the %3AC%*'s then we use it. Clearly there is a
point at which the complexity of the object stored is too complicated to be worth
remembering. However, the idea can be used quite profitably for variable references
and simple computations. This idea is a simple form of
%2common sub expression elimination%*. For example, 
assuming the the compiler knows that %3x%* is in %3AC1%*, here's code for:
.BEGIN SELECT 3;CENTER

f[car[x];cdr[car[x]]].
.END
.BEGIN TABIT1(33);SELECT 3;GROUP;

\(CAR 1 1)
\(CDR 2 1)
\(CALL F)
.END

This idea can be extended. There is nothing sacred about knowing only the
contents of the special registers. We could keep a history of the partial computations
in the stack. Then if we need a partial result we might find it already computed
in the %3AC%*s or stored on the stack. 
We might also  keep track of whether stack or %3AC%1 contents are still needed.
For example, in our compiled function %3j%1 on {yon(P165)} we might have
noticed that after the call on %3g%1, the value of %3x%1 was no longer needed;
therefore we need not save %3x%1. Similarly we don't need the value of %3y%1
after the call on %3h%1. If we build this kind of information
into a compiler, we can generate more efficient code. However,
many of these ideas must be used with some care.
Side-effects can destroy the validity of partial results.

Notice that we are comparing the %2symbolic%1 values in the %3AC%*'s  or stack;
we cannot look for actual values. This idea of symbolic processing can be exploited
at a much more sophisticated level in LISP compilers.
In particular, we can perform program transformations.
For example, the compiler can rewrite program segments taking advantage of
transformations it knows. These transformations typically involve equivalence
preserving operations which might lead to more efficient compiled code.

For example several LISP compilers have the ability to perform recursion removal,
replacing recursive programs with equivalent iterative versions⊗↓All these
transformations are typically  invisible to the user.←.
Here's a case:

.BEGIN SELECT 3;CENTERIT;
.p172:

←rev <= λ[[x;y][null[x] → y; %et%* → rev[rest[x];concat[first[x];y]]]]
.END
This program is automatically rewritten as:
.BEGIN SELECT 3;GROUP;TABIT2(20,23);turn off "←";

rev <= λ[[x;y] prog[[]\l\[null[x] → return[y]];
\\y ← concat[first[x];y];
\\x ← rest[x];
\\go[l] ]]
.END
This second version makes no demands on the run-time stack to save its
partial computation like the recursive version would. Typically 
this second version will execute faster.

A major obstacle to  most kinds of optimization is the unrestricted use
of labels and %3go%1s. Consider a piece of compiler code which has a label attached
to it.
 Before we can be assured of the integrity of an %3AC%1
we must ascertain that every possible path to that label maintains that %3AC%1.
This is a very difficult task. The label and goto structure required by
%3compile%* is quite simple. However if we wished to build an optimizing  
compiler for
LISP with %3prog%*s we would have to confront this problem.

.CENT(Problems)
%21.%* Extend the compiling algorithm to remember what it has in its %3AC%*
registers. How much of the scheme is dependent on lack of side-effects?

%22.%* Titled: " If we only had an instruction... "
We  advocate an instruction, %3EXCH ac loc%*, which will exchange
the contents of the %3ac%1 and the %3loc%1. This instruction could be used
effectively on the code for %3j%* on {yon(P163)} to save a %3PUSH-POP%*
pair. 
.BEGIN GROUP
Here's %3EXCH%* in action, using the results of  the previous exercise:
.TABIT2(10,45);SELECT 3;

\((LAP J SUBR)\%1; says %3j%* is a function%3
\ (PUSH P 2)
\ (CALL G)\%1; call the function named %3g
\ (EXCH 1 0 P)\%1; save the value and dredge up %3y
\ (CALL H)\%1; call %3h
\ (MOVE 2 1)
\ (POP P 1)\%1;   preparation for%3
\ (CALL F)\%1;   calling %3f
\ (RET))\%1; exit.%3 AC1%* still has the value from %3f.
.END

Look for general situations where %3EXCH%* can be used. In your tour
through the compilers, try to imagine other useful instructions.

%23.%* Write  code for the factorial function, and simulate
the  execution on %32!%*.

%24.%* Write a LISP function to take recursive schemes into equivalent iterative
ones in the style of the %3rev%* example on {yon(P172)}. Your first
version need not be as efficient as the one advertized there, but try to make it
better as you proceed. See ⊗↑[Dar#73]↑ for  this and related transformations.
.SS(Functional Arguments,,P3:)
Function variables add more complication to the compiling algorithms.
We will address the simpler cases of functional arguments.
There are two issuses involved: what  to compile when a %3function%1 construct
is recognized; and what to do when the function position of an application
is a variable.

.GROUP;
Consider an example:

.BEGIN CENTER;SELECT 3;
%3foo[ ...;function[%7f%3];...]%1
.END

%1We generate %3(MOVEI 1 %7f%3)%1 and compile %7f%1 if it is a
λ-definition; otherwise we essentially generate %3(MOVEI 1 (QUOTE %7f%3))%1.

Assume %3foo%1 is defined as:
.BEGIN CENTER;SELECT 3;
foo <= λ[[ ...;g; ...] ....g[t%41%3; ...;t%4n%3]]
.END
.APART
.GROUP;
The instance of %3g%1 in
%3g[t%41%3; ...;t%4n%3]]%1
is a special case of a computed function ({yon(P227)}); in this case, the computation is
only a variable lookup. We will display the more general code for 
a computed function call of the form:
.BEGIN CENTER;SELECT 3;

exp[t%41%3; ...;t%4n%3].
.END

We get:
.BEGIN SELECT 3;GROUP;TABIT2(25,32);

\append[\<compexp[exp;off;vpl];
\\list[mkalloc[1]];
\\<complis[(t%41%3; ...;t%4n%3);off-1;vpl]>
\\list[mkalloc[1]];
\\((CALLF n 0 P))
\\((SUB P (C 1))]
.END
.APART

The calling  structure  for  a  functional argument  
is slightly different. The arguments are on the stack  but,
more importantly, note that the call must  always  be
trapped  and decoded.  We cannot replace that call with
a  %3PUSHJ%1 to some machine  language code for  the function because the
function referred to can change.  We  use  a %3CALLF%1 instruction
to designate a call on a functional argument.
If the funtion is a variable name we cannot know at compile-time, what to
%3PUSHJ%* to. Since the value of the expression may very well change
during execution we can %2never%* replace the %3CALL%* with a %3PUSHJ%*.

Often, unneeded
generality allowed by functional arguments  can  be compiled
out.
Production LISP compilers, like the MacLISP compiler, are able to compile
efficient code for many instances of the LISP mapping functions, like
%3maplist%1; in the general case the code must expect arbitrary functional
arguments.

The problems of compiling efficient code become magnified  if 
generalized control structures are  anticipated. The problem is similar
to that of recognizing an implied loop construct in a program 
using labels and %3go%1's to control the algorithm. Control constructs
like %3catch%1 and %3throw%1 ({yon(P195)}) have some advantages 
here; rather than  using evaluation relative to
arbitrary access and control environments (⊗↑[Bob#73a]↑),
these constructs do impose some regularity which the compiler and 
the programmer can exploit.
.SS(Macros and Special Forms,macros,P57:)
We now wish  to extend our compiler to handle macro definitions.
Consider the example of defining %3plus%* of an indefinite number of arguments
given on {yon(P58)}.
In the presence of a compiler we can frequently make execution of macros
more efficient than their special form counterpart. 
The distinction is that macros only involve transformations 
which can be executed at compile time, whereas a special form
may involve run time information.
For example, consider the case of the macro definiton of %3plus%1.
When %3plus%* is called we know the number of arguments, and can simply
expand the macro to a nest of calls on %3*plus%*. For example:
.begin centerit;

%3
←plus[4;add1[2];4]     %1expands to%*    *plus[4;*plus[add1[2];4]] %1

.end
which can be efficiently compiled. 

Macros can also be used effectively in implementing abstract data structures and
control structures. For example,
the constructors, selectors, and recognizers which help characterize
a data structure can be expressed as very simple S-expr operations.
These operations are performed quite frequently in a data structure algorithm
and so any increase in their running efficiency will be beneficial.
Recall that on {yon(P60)} we defined %3coef%* 
as %3car%*. Compiled calls on %3coef%* would invoke the function-calling
mechanism, whereas many compilers can substitute actual hardware instructions
for calls on %3car%*, resulting in more efficient run-time code.
It would be better to write %3car%* instead of
%3coef%*. There are two objections to this. First, %3coef%* has more
mnemonic significance than %3car%*. Second, using %3car%* we have explicitly
tied our algorithm to the representation. Both are strong
objections.
Macros can help overcome both objections. Define:

←%3coef <%4m%*= λ[[l] cons[CAR;cdr[l]]]%1.

The user writes %3(COEF ...)%*; the evaluator sees %3(COEF ...)%* and
 evaluates %3(CAR ...)%*;  the compiler sees %3(COEF ...)%*
and compiles code for %3(CAR ...)%*. With macros, we 
can get the efficient code, the
readibility, and flexibility of representation.


Macros can also be used to perform most of the operations which special forms
are meant to do.
Since %3eval%* handles calls on special forms, we should examine the
extensions to %3compile%* to generate such code. We have seen that
in compiling  arguments to (normal) functions, we generate the code for
each, followed by code to save the result in the run-time stack, %3P%*.
The argument to a special form is %2unevaluated%*, by definition. All we
can thus do for a call of the form %3f[l]%*, where %3f%* is a special form,
is pass the argument, compiling something like:

.BEGIN CENTERIT;SELECT 3;GROUP;

←(MOVEI AC1 (%eR%f(%3 l %f)%3))
←(CALL 1 (E F))
.END
We have already mentioned some of the dangers in using special forms;
the fact that a compiler cannot do much with  them either,
makes them even less attractive.


.CENT(Problems) 

I. Extend the last %3compile%* function to handle macros.

II. Assume ⊗→%3and%*↔← allows  an indefinite
number of arguments.  Show how to modify %3compile%* to recognize %3and%*
and compile efficient code for its execution.

III. Define %3and%1 as a macro in terms of %3cond%1. Compare the code
produced in the two cases. How could you improve the compiler to make the two
sets of code more nearly alike?
.SS(Compilation and  Variables,non-local variables,P5:)
.BEGIN TURN ON "#";
The models of compilation which we have sketched so far store
their  λ-variables  in  the  stack,  %3P%*.   References  to  those
variables in  the body of  the λ-expression are  made to  those
stack entries.  This scheme suffices  only for lambda or %3prog%* 
variables which are used in a strictly local fashion.   
We have said that λ-expressions may refer to global
or free  variables.  The  lookup mechanism  simply finds the  latest
binding of  that variable in the  current symbol table; this is LISP's
dynamic binding strategy.
There are potential  difficulties for compiled code
when dealing with dynamic binding.   
The problem involves reference to variables which are currently λ-bound
but are non-local. Such variables 
are called %2⊗→special variable↔←s%1.


Assume first, that we are attempting
a deep binding implementation. If
all we store  on the stack is the value  of a
variable, then another  program which  expects  to use  that value
 will have no way of  finding that stored value.  One  scheme
is to store  pairs on the stack:  name and value, then we
can  search the stack for  the latest binding.  
This scheme is compatible with the stack implementation of deep binding given
in {yonss(P148)}.
 The  compiler can still  `know' where  all the
local variables  are on  the stack  and  can be  a bit  clever  about
searching for  the  globals or special variables.

Shallow binding implementations offer an alternative. We can still store
variables on the stack⊗↓We assume throughout this discussion that we are
compiling code for the stack implementation of shallow binding
as given in {yonss(P225)}.← if we are
sure that the variable is used in a strictly local fashion. 
If a variable is to be used as a special variable then
the compiled  code  should   access the
value cell of that variable.  
The compiler recognizes a variable as special by looking for the
property name %3SPECIAL%1 on the property list of the atom; if the property
exists and its 
value is %et%1 then the variable is a special variable and
the compiler generates different code.
 When a  variable, say %3x%*, is
declared special the compiler will emit a reference to %3x%* as 
%3(GETV#AC%4i%*#X)%1
or %3(PUTV#AC%4i%*#X)%1  
rather than the corresponding
reference to a location on the  stack.  
%3GETV%* and %3PUTV%* are 
 instructions to access or modify the contents of the value#cell.

When the LISP assembler  sees
one of these instructions, it will locate the value#cell
 of atom and  assemble a reference to that cell.  Since the location
of the value#cell does %2not%*  change, we can  always find the  current
binding. Any
interpreted function  can also sample the   value#cell so non-local values
can be passed between compiled and interpreted functions.  

Non-local variables, therefore necessitate some changes to our compiler:
we have to be a bit careful in dealing with special variables.
Assume a function %3f%* calls a function %3g%*, and assume that
%3g%* uses  some of %3f%*'s λ-variables.
The usual compilation for %3f%* would place the λ-variables in the stack
and they would not be accessible to %3g%*. Our compiler must therefore be
modified to generate
different prolog  and epilog code for special variables.
The code must  save the old contents of each special value#cell 
on entry to the function, and the compiler must generate code to restore
those cells at function exit.
As we described above, any references in either %3f%1 or %3g%1 
to those special variables
will involve  %3GETV-PUTV%* rather than 
references into the stack %3P%*. In this scheme, %3lookup[x;env]%1
is given by %3getv[x]%1.
.END


Non-local variables  cause several problems in LISP. The 
simple  mechanism we used for referencing local variables is no longer
applicable. Other programming languages allow the use of non-local variables,
some, like APL (⊗↑[Ive#62]↑), only allow global variables; others,
like Algol60 and its
successors (⊗↑[Alg#63]↑ and ⊗↑[Alg#75]↑), allow  free as well as global variables.
However, Algol compilers are much simpler to construct than LISP compilers,
and we should explore some of the reasons⊗↓For  reasons other 
than those we are addressing here, APL compilers are
difficult to construct.←.
One simplicity of Algol  is  its treatment of procedure
valued variables. Algol dialects typically restrict  themselves to 
what LISP calls functional
arguments. Algol
dialects do not allow arbitrary procedures to be returned as values.
Their restrictions
 allow  the run time environment to modelled in a stack, typically
as described in {yonss(P148)}.

The  difference between LISP and Algol, which is more to the point here, is
their binding strategies#({yonss(P134)}). A typical LISP uses dynamic binding
whereas Algol  uses static binding. The difference is that Algol translators
determine the bindings of variables at the time the definition is made
whereas LISP determines bindings at the time the function is applied.
That is, definitions in Algol always imply an application of %3function%1,
binding up all non-local variables. 
The net effect is that Algol  don't have free variables in the
sense that there is any choice of bindings; all choices
have been made when a procedure is declared. That 
binding decision has dramatic
results when we come to implement language translators. As a result,
Algol can effectively compile all variable references
to be references into the run time stack, and need not retain the
name stack for variable look up at run time.  It is not at all clear
yet which  binding strategy will dominate. Counterexamples to exclusive
use of either strategy exist. Recent work (⊗↑[Ste#76b]↑, ⊗↑[Sus#75]↑, ⊗↑[Hew#76]↑)
points to the use of static binding in LISP-like languages.

A typical preserve of dynamic binding is  that of interactive programming,
where programs are created as separate entities to be connected into
a cohesive whole at some later time. Frequently one wants the bindings
of the free varaibles to be postponed until such a grouping is made.
.SS(Interactive Programming,,P239:)

.BEGIN INDENT 20,20;
"... What is actually happening, I am afraid, is that we all tell each other
and ourselves that software engineering techniques should be improved
considerably, because there is a crisis. But  there are a few boundary 
conditions which apparently have to be satisfied. I will list them
for you:

%21.%1 We may not change our thinking habits.

%22.%1 We may not change our programming tools.

%23.%1 We may not change our hardware.

%24.%1 We may not change our tasks.

%25.%1 We may not change the organizational set-up in which the work
has to be done.

Now under these five immutable boundary conditions, we have to try to
improve matters. This is utterly ridiculous. Thank you. (%3Applause%1).
.END
.BEGIN TURN ON"→";
→E. Dijkstra,  %3Conference of Software Engineering, 1968%1.
.END



We have talked about the constructs of LISP; we have talked about
interpreters and compilers for LISP; and we have talked a little about
input and output conventions for the language. 
The combination of these properties  leads us to a most interesting 
practical aspect of LISP as a programing language: its use as an
interactive programming language. A programming language is a tool
for building programs. LISP's 
representation of programs as data structures coupled with 
the availablilty of display terminals offers the LISP programmer
unique opportunities for the interactive construction of programs.
Historically, machines have been oriented towards the rapid 
execution of well-defined numerical algorithms. This perspective
over-looks two important points. 

First, the actual process of discovering,
refining, and encoding the algorithm is a complex process. 
In the early days of computation,
the programmer performed the analysis
on the algorithm on paper, transcribed the algorithm to a programming
language, encoded that program on coding sheets and keypunched  a
card deck. That deck was supplied with data cards and presented to the machine.
If some
abnormal  behavior  was  detected  in  your program,  you  were  also
presented with an  uninspiring  octal dump of  the contents  of
memory. Memory dumps were an appalling debugging technique, even then.
Programming was frequently done in a higher level language so the contents
of most machine locations had little meaning to the programmer. Also the
state of the machine at the time the dump was taken frequently had only 
a casual relationship with the actual bug.

The programmer had a slightly more appealing alternative 
called %2tracing%1. 
A program could be embellished with print statements whose purpose was to
determine  access  behavior by printing values of variables,
and to  discover control behavior by printing messages at entry and exit from
procedures.  This tracing technique was frequently available as an
operating system option. Then the programmer would  supply control cards which
expressed what kind of output  was desired. In either case,
unless this  technique was used with resolute precision, the output
would either be voluminous or uninformative, or both.

When the cause of a bug was
discovered the offending cards were replaced and the
deck was resubmitted for execution. 
This cycle was repeated until
an acceptable program was developed. This approach is still followed
by a majority of programmers. What is missed is that
much of the detail and tedium of these early phases of program development
can be aided by a well-constructed interactive programming system.

The second point which is overlooked is that a large class of interesting
problems are not included in the range of "well-defined numerical algorithms".
In fact most of the problems which are traditionally attacked by LISP
programs fall into this class: language design, theorem proving,
compiler writing, and of course, artificial intelligence. 
In such "exploratory programming" it is often the case that no well-defined
algorithm is known, and it will be the final program which %2is%1 the
algorithm. Such exploratory programming requires that the programming language
be usable as a sophisticated "desk calculator". It requires experimentation with,
and execution of, partially specified programs; that is the ability to
develop and run pieces of programs; to build up a larger program from pieces;
to quickly modify either the program text or the computational results themselves,
before the programmers have lost their train of  thought.

An important outgrowth of such exploratory programming is a LISP technique called
"throw-away implementation". In developing a large programming system
one begins with a few simple ideas and incrementally develops the larger system.
If some of the ideas are inadequate they are replaced. At some stage an adequate
model is  developed; it may be lacking in some aspects, but it does model the
desired phenomenon. At that point, the programmer's understanding has
been sufficiently improved, that the implementation should be thrown away and a 
new, more enlightened version,   created.
Certainly this kind of programming can be accomplished with languages other
than LISP; the point is that an interactive LISP environment is 
sufficiently  responsive that an implementation cycle is quite short and
relatively painless. The effect is that the programmer  does not invest
so much effort in an implementation that he feels  a compulsion to 
patch an inferior implementation, rather than start afresh.


The development of an interactive programming system has several important
implications. The usual partitioning of program preparation into editing,
running, and debugging  is no longer adequate. The text editor and the debugger
are
 integral parts of the system. A programming "environment" is established
in which all facets of programming are integrated. The tasks which are usually
performed by an operating system are  subsumed by the programming language.
The idea of a separate file system becomes obsolete, and  all programs and
data are accessible from within the "environment". This has an important
advantage in  LISP-like representations of programs: the conversion from
internal representation to "text file" format is eliminated. The
technique puts added burden on naming facilities so that a programmer's
definitions are  accessible, but are unambigiously  addressible.
The effect is to  structure the interactive environment as a very large data base
containing programs and data structures. The programmer has accessing procedures
to manipulate the elements in the base. All the items in the base
are accessible as data structures; 
the editor can modify any objects in the base.
Some of the objects can be executed as procedures; the evaluator is
responsible for this. A procedure object can be further expanded into
machine instructions for faster execution; this may either be done by an
explicit call on a compiler, or better yet, be done invisibly
by a compiler/interpreter. If an evaluation is not performing
as expected, yet another data structure manipulating program is available.
The debugger is able to manipulate both the program structure and the run#time
data structures which the evaluator has created. Any of these data structure
manipulating programs is able to call any other program. This
view of program development is in direct conflict with the
traditional approach  which grew from the card#deck philosophy.
Several current  research projects are developing along these lines;
among them are ⊗↑[Hew#75]↑, ⊗↑[Int#75]↑,  and  ⊗↑[Win#75]↑. All of these
projects are based on LISP.

It is not accidental that LISP is the major language for these kinds of
programming tasks; it is the features of the  language which make it
amenable to such programming tasks. In the next two sections
we will discuss two of the ingredients of interactive programming
systems; this will illuminate the features of LISP which make it
important for interactive programming. 


First we will sketch some basic features of LISP text editors. This will
show some of the benefits of having the program structure available as
a data structure. The succeeding section will discuss a few features of a typical
LISP debugging system; this will  further demonstrate the advantages of having
a natural program representation available at run time.

There is no standard LISP editor or debugger.
Therefore the next 
sections will contain general information rather than
an abundance of concrete examples. 
The design of such devices
is a subject of very personal preferences and prejudices. Some characteristics
are common and those we will stress. A related difficulty is that editing and
debugging devices are best done as interactive display programs, rather
than as key-punch or teletype programs⊗↓That is one of the author's 
preferences and prejudices.←. 
Interactive programming is a very visual and dynamic enterprise;
teletype-oriented  interaction is not sufficient since it
results in a presentation  more like a comic strip than a movie.
.SS(LISP Editors,,P247:)
A LISP editor is just another LISP program; it operates on 
a data structure. In this case the
data structure represents a program. A simple editor could be constructed
along the lines of the %3subst%1 function:

.BEGIN SELECT 3; GROUP; TABIT2(17,25);

subst%9'%3 <= λ[[x;y;z]\[atom[z] → [equal[y;z] → x; %et%3 → z];
\ %et%3 → cons[\subst%9'%3[x;y;car[z]];
\\subst%9'%3[x;y;cdr[z]]]]]
.END
That is, we would let %3z%1 be the program; %3y%1, the piece of
text to be replaced; and %3x%1, the new text.
Such global editing %2is%1 useful, but control of text editing
is necessary.

A typical editor will take an expression as input and will
then enter a "listen-loop", waiting for commands from the user.
The input expression may either be a list representing a constant
data structure, or a list representing a (constant) function.
There are commands for the  selection of a  subexpression of the
input expression; and there are commands for the replacement
of expressions with other expressions.

The mechanics of presenting list structure to the user are interesting.
Since a list may be deeply nested, we need a convenient way of designating
subexpressions. On a display device we might display the selected expression
more brightly than the containing expression. That is useful, but
more structured representations of text are required.
Typically, a "pretty-printed"#({yonss(P226)}#or#{yon(P245)})
version of the text is presented.
If the text is  too extensive to fit on the display face, then
abbreviational devices are available.

If the text is deeply nested  it is  often difficult to
perceive the top level structure even in pretty-printed form.
Consider the S-expr representation of the definition of %3member%1:

.BEGIN SELECT 3;TABIT1(10);group;

(MEMBER
 (LAMBDA (X L) 
  (COND\((NULL L) NIL)
\((EQ X (FIRST L)) T) 
\(T (MEMBER X (REST L))))))
.END

In this case the structure  of the %3COND%1 is clear; but
it is clearer if we express it as:

.BEGIN SELECT 3;TABIT1(10);

\(LAMBDA (X L) (COND & & &))
.END

.BEGIN group;centerit;

Or given a  linear list:←%3(%7a b c d e f g h i j k l%3)%1
.END
it may be more instructive to display it as:
.BEGIN group;centerit;
←%3(%7a b c d e f g h i %3... )%1   or,

←%3( ...%7d e f g h i %3... )%1.
.END
where the focus of attention is controlled by the user.

There  should be commands  to move selected subexpressions
to different locations within the same structure and move expressions
to other structures. Since a common text error in LISP is the
misplacing of parentheses, there are commands to move parentheses.

There are at least two reasons for text editors: programmers make errors,
and programs which are correct need to be modified to perform other
tasks. Particularly in  exploratory programming, the "try-it-and-see" attitude
must be  supported. Thus we demand a flexible editor which can
"undo" changes to functions or constant data structure.
LISP editors have the ability to save the current 
edit structure such that an "undo" can restore to that state if
needed.


Regardless of the idiosyncrasies of a particular editor
the common feature is that LISP editors are structure-oriented
editors. They operate on well-formed S-expressions, not
text strings or card images.
A program is not a linear string of characters; it is a 
structured entity, whose parts are distinguishable
as representations of instances of programming language constructs.
The editing process
should take cognizance of that structure⊗↓ A case can be  made for
believing that the program %2construction%1 process should also
be driven by that structure ⊗↑[Han#71]↑.←.


.SS(Debugging in LISP,,P168:)
Few areas of the Computer Science field are as primitive
as that of  debugging. Few areas of the field are as important. Getting a
correct program  indeed is the whole point of our programming.
The power of our debugging techniques  has been directly related to the
sophistication of the hardware/software interface which is available.
Not until the advent of sophisticated on-line systems has there really 
been any hope for practical "correct-program" construction.

Several pieces of information are required to do interactive
debugging. We need an indication of the error condition; we need the
current state of the computation; we need to have some indication of
how the computation arrived at the error condition; and, if
interactive debugging is to be meaningful, we need the ability
to modify the computation and resume  execution in that modified environment.
This last point is quite important; it has implications for programming
style.   First, we %2should%1 hope to modify an errant calculation
rather than restart the entire computation. To start over is like
repunching a whole card deck because one card was wrong. We repunch
the offending cards and retain the rest. Similarly we should explore the
possibilities of throwing away offending computations and retaining the
rest. Typically, computation is not as local and exciseable as
removing a single card; a primary purpose of most computation is to
pass a result to some other procedure.  However, if we try to localize the
effects of each procedure to simple parameter passing and value returning
then we have a better chance of discovering a point in the computation history
which is prior to the error; return the control stack to that point;
modify the  erring procedure and restart the computation. This
implies that procedures should minimize their use of side-effects;
for it is side-effects which spoil the nice applicative behavior
and will require the programmer to make explicit modifications in the
computational environment before a computation can be restarted.
Such a programming style is called %2⊗→modular programming↔←%1, meaning
that each procedure is a module, or black box, dependent on other
procedures only by way of well-defined input and output considerations.
It is this style of modular programming which will enhance the use
of interactive debugging tools. 

This section will deal only with the primitive mechanisms which underlie
LISP debugging techniques. The discussions of more complex tools
which are available or are comtemplated
are well-documented in other sources; ⊗↑[Int#75]↑, ⊗↑[Moo#74]↑.

Crucial  pieces of  information  about the
behavior of the program are: which functions are being  entered, what
are  the actual  parameters, and what are the  values being returned.
Assume that  we wish to monitor the behavior of the function, %3foo%*. We
will place the  real definition of %3foo%*  on another symbol table  entry
(using %3gensym[]%*)  and redefine %3foo%* such  that, when it  is called, it
will:

.BEGIN narrow 10;;

%21.%*  print the values of the current actual parameters.

%22.%*	use %3apply%* to call the real defintion of %3foo%* with the current parameters.

%23.%*	print the value of the call on %3foo%*.

%24.%*	return control and the value to the calling program.
.END

This technique is called tracing, and the description we have given is one
suitable to a teletype-like device. Given an interactive display and a well-defined
"LISP machine" description like that in %3peval%1#({yonss(P211)}),
a much  more satisfactory trace can be given.

Since  %3foo%*  may  be  recursive  we   should  also  give  some
indication of the  depth of recursion being executed.  Now every call
on %3foo%* will give us  the pertinent statistics.  Interpreted calls  on
%3foo%* will  go through %3eval%*, and  if %3(CALL ... FOO)%* is being  used in the
compiled  code  the  submonitor  can  pass  control  to  the  tracing
mechanism; but if the call  is a %3PUSHJ%*, the control passes
directly to the machine language code  and we will not intercept
the call.

On most implementations  of LISP the %3CALL%*   may be replaced
by a %3PUSHJ%1. This is done to speed execution, since a %3PUSHJ%1
executed at machine speed, transfering to known location; whereas
the %3CALL%1 is passed to %3decode%1#({yon(P243)}) and 
the function definition is looked up. Typically there is a 
programmer-controlled mechanism for replacing %3CALL%1s with %3PUSHJ%1s⊗↓As
we have seen, %3CALL%1s to functional variables won't be replaced.←.
After  a  program  is
debugged,  the programmer may replace  the  %3CALL%*  with the  
%3PUSHJ%*  and  the
programs will  execute faster. On some implementations this action is
even reversible#(⊗↑[Ste#pc]↑); a table is kept, relating the %3CALL%1s
to the %3PUSHJ%1s; when tracing is desired, the %3CALL%1 version is made
available⊗↓Actually, the scheme is as follows: instead of
assembling the %3CALL%1 into a memory location, an XCT is assembled;
the XCT references a copy of a table which contains the actual %3CALL%1.
The  user may replace the %3CALL%1s by %3PUSHJ%1, but also
has the original table available to replace the modified version
when tracing is desired. This XCT trick has the additional benefit
of allowing several users to share  a "pure" piece of program, while
some people are tracing and some people are not.←.


A variant of this tracing scheme can  be used to monitor %3SET%*s
and  %3SETQ%*s.  We can modify  their definitions  to
print the  name of the  variable and the  new value, perform
the assignment, and
return.  This technique can be lost in some compiled code. If we
compile local variables as references into the value stack, we have lost both
the names and the ability to trace their behavior.
Variable references which use %3PUTV%1 and %3GETV%1 can be
traced like %3CALL%1. In fact, on %aSM%1, we have an operation
analogous to %3PUSHJ%1, so the %3CALL-PUSHJ%1 technique is open to us.
%3PUTV%1 and %3GETV%1 can be implemented as hardware %3MOVEM%1 and %3MOVE%1
instructions.

The trace facility is a debugging feature which has been
adapted from the batch-processsing versions of LISP. There is a related,
but more interactive, version of this technique called the %2break%1 package.
In this mode of tracing, the user can specify that the program should halt 
on recognition of certain conditions. If that halt occurs, the  break package
is entered and the user may then type commands which survey the state of the 
computation. Expressions may be evaluated, which may themselves enter the break
package recursively. If desired, the LISP editor may be called
either to edit function definitions or to edit an expression on the
actual control stack of the current computation.


Since it is difficult to predetermine when a computation may
require debugging, several systems supply an interrupt system
analogous to that found in hardware machines. Striking certain keys
may then cause interrupts to the break package, just as if a break condition
were pre-programmed. Such a feature is useful in conjunction with the
trace package. If a trace indicates to the user that the computation
is not performing according to expectation then an interrupt key can be
struck and the computation  will be suspended.

User-definable interrupt systems apply to other areas of computation
than that of debugging. The most well-developed system is that
of MacLISP.
The ability to  selectively trace the execution, coupled with the ability to
interrupt a computation, allows  the user  to
examine computations which are suspected of divergence.


.SEC(Storage Structures and Efficiency,,,P210:)

Though it is true that any algorithm can be coded in terms of
manipulations of binary trees, there are many instances in which more
efficient organizations of data exist.
At the most elementary  level are vectors and arrays.   These
data structures could  certainly be stored in a list structure format
and individual components accessed via %3car-cdr%* chains.  However, most
machines  have a  hardware  organization which  can  be exploited  to
increase   accessing   efficiency  over   the   list  representation.
Similarly, strings can  be represented as  lists of characters.   The
string processing operations are expressible as LISP algorithms.  But
again this is usually not the most reasonable representation. Even  at
the level of list-structure operations, simple binary trees might not
be the most  expeditious representation for every problem.  Also many
of the algorithms we  have presented in  LISP are overly wasteful  of
computation time.  This section  will begin an examination of
alternatives   to  LISP  organization.     

There  is   no  best  data
representation, nor is there a  best algorithm.  The chosen representation
must match the machine and  the problem domain being studied.  If
the problem  is strictly numerical then list-structure is not the
answer; if simple string manipulation is sufficient 
then list processing is too general.  And there are many applications
of list processing which are sufficiently well-behaved that 
complex devices like garbage collectors are unnecessary.
The point  is that if you have seen the list-processing techniques in
a rich environment  such as  LISP and have  seen the applications  to
which  LISP may  be put,  then you  will be  prepared to  apply these
techniques  in  a  meaningful  way.    Many  times  an  (inefficient)
representation in  LISP  is all  that is  needed.   You  get a  clean
representation  with comprehensible algorithms.   Once you've studied
the algorithms, efficiencies might come to mind. At that  time either
recode the problem using some  of the obscene LISP programming tricks
or drop into machine language if very fine control is required.  Once
you have  %2a%*  representation it  is  easy to  get  better ones.    For
example,  once you have  a compiler  which is   correct  it is
easier to describe a smarter one.
		
This section will describe other representations than LISP binary trees
and will show some ways of increasing the efficiency of LISP programs

.SS(Bit-tables,,P263:)
In the marking phase of a garbage collector it is
necessary to  record the visitation of each word.
Frequently it is not possible to place a mark in the  actual word.
This  might occur for  several  reasons: 
.BEGIN INDENT 0,3;
%21.%1##For words in  FS, there is  no
room if each  word contains exactly two addresses.
.SKIP 1
%22.%1# The word is
in  FWS and  the  meaning of  the information  stored there  would be
changed if we set a bit.
.SKIP 1
%23.%1# In structures, more complex than dotted pairs, there may not be room for
marking bits.
.SKIP 1
%24.%1# If a bit is assigned to each word, then  the initialize phase
requires that we visit each such word. This violates 
"locality of reference".⊗↓Locality  refers to the  relative distance between
memory locations assigned in a particular structure. In some machine organizations,
memory is divided into "pages" of a relatively small size. There is significant
overhead involved in crossing page boundaries. Therefore memory referencing
which entails many scattered references is said to violate "locality of
reference.← 
.END

An alternative solution is to designate a separate section of memory called a
%2bit-table%1. The bit-table is a sequence of binary flags; and there
is a one-to-one correspondence between a flag and a markable location.
Whenever we wish to record the visiting of a word, we set the corresponding flag.
A bit table is represented as a sequence of machine locations with several
flags represented in each word. There is a simple calculation to relate a
bit in a word to its corresponding  markable location.
It is frequently faster to initialize a whole 
table rather than zero single bits in  separate words.
.SS(Vectors and Arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
%2Vectors%*:  Vectors (or  one  dimensional arrays)  are  usually
stored sequentially in memory.  Simple vectors are usually stored one
element to a  memory location though  this is not  a necessity.   For
example a complex vector is very  naturally stored as pairs of cells;
or if  perhaps you would allow vectors of non-homogeneous data modes,
each element would  have to include type  information.  In any  case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.  There is a natural
simulation of  a stack as  a (sequential) vector  with access  
 made via a global pointer to the vector.
.SKIP 1
%2Arrays%*: Arrays are multi-dimensional data  structures.  Since
most  machine memories are linear  devices we must map  arrays onto a
linear representation.   We will  restrict attention to  the case  of
two-dimensional  arrays.   Most  of the  discussion generalizes  very
naturally.   A very common device is to store the array by rows; that
is,  each  row is stored sequentially,  first, row 1; then  row 2,...
Given this representation there  is a trivial calculation to find the
location of an arbitrary element, A[i;j], if we know the location  of
the first  element, A[1;1] and  the extent of  the dimensions  of the
array.  For an array A[1:M; 1:N]
.SKIP 1
←loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)
.SKIP 1
In languages like Fortran which require that the size of the array be
known at compile-time  the compiler can generate the accessing code
without problem.  Languages, like Algol 60, and some versions of LISP
which allow the size of an  array to be determined at runtime require
a bit more care.  Algol 60, for example requires that you declare the
type (real, boolean,  etc.) of  the array and  specify the number  of
dimensions  in the  array,  but you  can postpone  until  runtime the
designation of the size of each dimension.  To handle this complexity
a dope vector is  introduced. 
A %2⊗→dope vector↔←%*  is typically a header or descriptor associated with
the area containing the actual array elements. The information in the dope vector
tells the functions  which access the array, how to treat the data.
Type and dimensionality are typical entries in dope vectors.

The compiler can determine  the size of
the dope vector, but  not the contents.  The dope vector is filled in
when the array declaration is executed 
and  contains information about the  actual extent of  the
array bounds.   Since  the size of the  array is not  known, the
compiler  cannot   allocate  space  for  the  array  elements.    The
allocation must also be  done at runtime.   When the array declaration  is
executed we must  know all the information about the  array.  At that
time we allocate space on the stack and complete the dope vector. Subsequent
references to array elements use the dope vector. When we leave the
block which  caused allocation of the array, the 
stack space is reclaimed.

Assume that
the  array elements are stored  by rows.  Look  at the calculation of
the location of element, A[i;j].   For specific execution of an  array
declaration much  of this information  is constant; the  location of
the array  elements, in particular, A[1;1] and the number of columns,
N, is fixed.  Thus we rewrite the calculation as:
.SKIP 1
\constant part\variable part

\ [loc [A[1;1]]-N-1]       +\  	(i*N+j)
.SKIP 1
The constant  part is stored  in the  dope vector.   When we wish  to
reference  an element A[i;j] we  need only compute  the variable part
and add it to the constant part.

The dope vector for A [1:M; 1:N] perhaps might contain
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		⊂ααααααααααααααα⊃
		~      2        ~
		εααααααπααααααααλ
		~   1  ~    M   ~
		εααααααβααααααααλ
		~   1  ~    N   ~
		εαααααα∀ααααααααλ
		~ constant part ~
		εαααααααααααααααλ
		~    . . .      ~
		 array elements
		~    . . .      ~
		%ααααααααααααααα$
.END


There is another scheme  for storing arrays which is  used in
some of  the Burroughs machines#(⊗↑[Org#71]↑,#⊗↑[Dor#76]↑). 
Each row  is stored sequentially and
access  to  separate  rows  is   made  through  a  device  called   a
%2⊗→mother-vector↔←%*.   The mother vector is  a vector of pointers  to the
rows.  Thus:



.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TURN OFF "\";
.TABS 50,54;
                                                  
	⊂ααααααα⊃    ⊂αααααααααααααααααα     ?αααα⊃
 	~    #ααβααα→~ A∞411∞*  A∞412∞*   ...?A∞41N∞*?~
  	εαααααααλ    %αααααααααααααααααα     ?αααα$
	~    #ααβαα→ ..
	εαααααααλ
	   . . .

	εαααααααλ    ⊂αααααααααααααααααα     ?αααα⊃
 	~    #ααβααα→~ A∞4M1∞*  A∞4M2∞*   ...?A∞4MN∞*?~
  	%ααααααα$    %αααααααααααααααααα     ?αααα$
.END

Notice that the accessing computation is very  cheap⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.←.  Another effect
is  that all rows  need not  be in memory  at once.   On a  paging or
segmenting machine 
this array organization can be helpful.  If an access to a row not in
core  is made, a `page  fault' is raised; the  monitor brings the row
into memory and the computation continues.   The mother-vector scheme
generalizes  nicely  to   multidimensionality  and  can  be  used  in
conjunction with a dope vector.

.END
A typical implementation of an array facility in LISP would include
a declaration:

.BEGIN INDENT 0,10;
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... ;<bounds>], where the
identifier names the array; the type could be numeric or S-expr; and  finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
              ⊂ααααααπαααα⊃    ⊂ααααααααααααα⊃
              ~ SUBR ~  #αβ→αα→~ dope vector ~
              %αααααα∀αααα$    ~ calculation ~
			       εαααααααααααααλ
			       ~    array    ~
			       ~  elements   ~
			       	    . . .
			       %ααααααααααααα$
.END
If we are to store S-exprs in the array then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.

When an array element is to be referenced, then the subscripts are evaluated
(since the array name was declared as a %3SUBR%*) and the dope vector code
is executed, resulting in a reference to the appropriate cell.

.GROUP;
We also must be able to store information in the array.
.BEGIN INDENT 0,10;
%3store[%1<name>[<subscr>; ... ;<subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.END
.APART
.CENT(Problem)
Implement a stack in LISP first using lists or dotted pairs, then using
an array. Include inplementations of the stack operations.
.SS(Strings and Linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words, each containing part of the external
name for the atom. Print names are a special instance of a data structure
named strings, and our use so far in LISP of strings has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss components of a string-manipulating
language. 
The
organization  and  implementation  of   a  general  string  processor
directly parallels LISP.  In fact a subset of LISP,
⊗→linear LISP↔←,  is a nice notation for string algorithms.

A string is a sequence of characters.   A language
for string processing should allow the creation of strings
of arbitrary length at runtime; it should allow the generation of new
strings and the  decomposition of existing  strings.  If  strings of
arbitrary length are  to be created, an organization similar to FS in
LISP can be used with, perhaps, a string garbage collector.   We will
assume this most general case.
The machine memory will  contain at least a ⊗→string  space↔←, an
evaluator, a symbol table, and a garbage collector.

String space is a linear sequence of cells, each of which can
contain exactly  one character.   A string  will be  represented as  a
sequence of contiguous character cells.  A string variable will be an
entry in  the symbol table; the current value of the variable will be
represented as a pair: character count and a pointer to the beginning
of the character sequence.

Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
  ⊂αααπααα⊃
  ~ 3 ~ # ~
  %ααα∀αβα$
        ~
        %α→ααα→⊃
               ↓
    . . .ααααπαααπαααπαααπαααπαααπααα . . .
	       A   B   B   1   D   . . .    string space
    . . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.END
encodes the string %3ABB%*.

.BEGIN TURN OFF"←";
We must make some decisions about how  we manipulate strings: when we
perform %3x ← y%*, do we copy the symbol table pair of %3y%* into that of %3x%*, or
do we make a new string isomorphic to %3y%* and point %3x%* at it.   It makes
a  difference.    We  will   choose  the  former,  copying  only  the
`descriptor',  that  is,  we  will  share  strings  (and  substrings)
wherever possible. This decision makes the  storage requirements less
expensive, but will make our  life more difficult when we worry about
⊗→garbage collection↔←.   There are  three  primitive functions: ⊗→%3first%*↔←,
⊗→%3rest%*↔←,  ⊗→%3concat%*↔←  (read: %3car%*,  %3cdr%*,   %*cons%*).
.END

.GROUP;
.BEGIN INDENT 0,6;
%3first[x]%*  is  the first
character of the string represented by %3x%*.  %3first%* is undefined for the
empty string,#%7e%1. For example:
.END

←%3first[ABC]%1 is %3A; first[%7e%3]%1 = %9B%1.

.APART
.GROUP;
.BEGIN INDENT 0,5;
%3rest[x]%* is  the string  of characters  which remains  when the  first
character of  the string is deleted.  %3rest%* is also undefined for the
empty string. For example:
.END

←%3rest[ABC]%* is %3BC%*

.APART
.GROUP;

.BEGIN INDENT 0,9;
%3concat[x;y]%* is a function of two arguments.   %3x%* is a character; %3y%* is
a  string. %3concat%* forms  a string  consisting of the  concatenation of a
copy of %3x%* and a copy of the string, %3y%*.  For example:
.END

←%3concat[A;BC]%* is %3ABC%*.
.APART
.GROUP;
	
There are three string predicates:

.BEGIN CENTERIT;
←⊗→%3char%*↔←%3[x]%1:  is %3x%* a single character?
←⊗→%3null%*↔←%3[x]%1:  is %3x%* the empty string?
←%3x ⊗→%3=%*↔← y%1:  are %3x%* and %3y%* the same character?

For example:

←%3char[A] %1is %et%1
←%3char[AB] %1is %ef%1
←%3AB = AB %1is %9B%1
.END
.APART

The implementations of these string primitives is somewhat  less complex
than that for LISP's primitives.
%3first%* generates a character  count of 1 and a  pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.

%3concat%* is a  bit more  problematic.   We copy %3x%*  and copy  %3y%*
so that %3y%1 follows %3x%1 in free string space; we
generate  a character  count of  one plus the count of  %3y%*,  and
generate a pointer to the copy of %3x%*.  The copies are
made in the free string space pointed to by the %2⊗→string space pointer↔←%*.
The obvious  question of  what to do  when string space  is exhausted
will be postponed  for a moment.   The implementations  of the  three
predicates are easy: we will blur  the distinction between characters
and strings  of length 1.  Thus %3char%*  need  only check the character count.
%3null%* gives %et%1 if the count is 0.  What about#=#?  Obviously characters
are  not  stored   uniquely,  so  we must  make  an  actual character
comparison.

The implementation of storage management in a string processor can be more
complex than a simple LISP implementation. We use a garbage collection
scheme which  in  some  ways,
is simpler and in some ways more complicated, than a collector
of list-structure.   The  marking phase  is much  simpler  since we use  the
descriptor in the symbol table to mark  the character string.  Since we
are sharing  substrings, we cannot stop the marking simply because we
have encountered a previously  marked character, but at least
the marking is not recursive.   

However, the collection
phase needs to be  more sophisticated for  string processors.   Since
strings are  stored linearly (or  sequentially), a  fragmented string
space  list  is  of  little  use.    Thus  we  must compact  all  the
referenceable strings into one end of string space, freeing  a linear
block for the new free string space.  Since we are sharing substrings
a  little care  needs  to  be exercised.    When  we move  a  string,
obviously the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location.  So before
we begin the relocation of strings  we sort the string descriptors on
the  basis of their  pointers into  string space.   We then recognize
each parent string, moving it down into freed  locations and updating
the address  pointers of any  substrings.  We  continue this process.
Eventually all strings will  be compacted down  in string space.  The
string space  pointer will  be set  and the  computation can  continue.
Compacting garbage collectors  can be adapted for use in LISP or more
general types of data structures.

.END
.SS(A Compacting Collector for LISP,,P291:)
.P173:
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the collection phase of string 
garbage collector and 
produce a  compacting garbage collector for LISP.

The intention is to move all active structures to contiguous locations at 
one end of the appropriate space. 
There are several motivations for compacting storage. First, besides making the
%2active%* storage contiguous, we also make the %2free%* locations contiguous.
Thus the free lists can be handled as arrays rather than as lists. To get the next
free element, take the next element in the free array. 

The second reason for considering  compaction comes up in languages other
than LISP⊗↓As we shall soon see, the rationale is
applicable in LISP as well.←. If the language allocates storage in a manner similar to LISP
but the constructs allow %2different%* sized blocks to be specified
 (a string processor is a simple example), then
compaction is a strong contender. To subdue the fragmentation of memory
we could consider compacting all free locations to one end of memory.
More general schemes are also viable. We will talk about these possibilities
when we discuss dynamic allocation.

Another reason for 
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this would obviously increase machine 
overhead⊗↓Very little empirical work has been done on the actual storage
requirements and running environment of LISP. 
At start is made in ⊗↑[Cl#76]↑; much more should be done.←.
However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.

So granted that compaction is a worthwhile endeavor, how do we proceed?
Clearly we can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 40;
	   ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
      204  ~204~204~∞1 is not the same as:∞b   ?100  ~204~204~     
	   %ααα∀ααα$     ?     %ααα∀ααα$                
.END
                                                 
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 40;
	   ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
      204  ~204~204~∞2 is∞1 the same as:∞b   ?100  ~100~100~     
	   %ααα∀ααα$     ?     %ααα∀ααα$                
.END

To handle these problems, we expand the sweep phase into two phases:
the %2relocating%* phase and the %2updating%* phase. As with the sweep phase,
there are relocating and updating phases for %2each%* space.

The relocating phase begins after all active structure is marked.
Assuming we are to compact all active structure %2down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a marked location;
we move the free pointer up until we locate an %2unmarked%* cell. 
We want to  move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move  may reference other cells
which will be moved.  

.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~   ~   ~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~   ~   ~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      204  ~402~ 77~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~204~402~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.END
.END

Cell %b77%* was active so we left it alone; it references cell %b65%*,
which  has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where it has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		   ⊂αααπααα⊃     
	      100  ~204~402~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃ 
	      402  ~    100~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.END
.END
The active pointer, having writ, moves on; 
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will  skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %acol%*.
When they  meet, all locations above %acol%* will either be free or will
contain forwarding addresses. All addresses, %acol%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.

.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~204~402~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~402~ 77~ 
		   %ααα∀ααα$                
                     . . .   ← ∞acol∞*
		   ⊂αααπααα⊃     
	      204  ~   ~155~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~   ~100~ 
		   %ααα∀ααα$ 
		     . . .
.END
.END
We examine the initial segment of our space from the bottom to %acol%*
looking for any references to that area %2above%* %acol%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
It is obvious what to do: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't 
change. Cell %b100%* refers to two relocated cells; we find their forwarding 
addresses, and cell %b100%* becomes:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		   ⊂αααπααα⊃     
	      100  ~155~100~ 
		   %ααα∀ααα$                
.END
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cells below %acol%* are updated, the garbage collection is finished.
The cells above %acol%* are all available for the free-list.

.CENT(Problems)
%21.%* Is %acol%* in the free space list after the update phase?

%22.%1 Write a LISP algorithm for compacting garbage collection in LISP.
.SS(Representations of Complex Data Structures,,P138:)
In our discussion of abstract context free data structures in {yonss(P287)},
we isolated three kinds of  structures.  

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 ...  %eD%41%1  
←e.g.   <seq> ::= %2(%*<seq elem>%2,%* ..., <seq elem>%2)%*
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 | %eD%42%1 | %eD%43%1  
←e.g.  <seq elem> ::= <indiv> | <seq>
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 %eD%42%1 %eD%43%1... %eD%4n%1
←e.g.     <sexpr> ::= %3(%*<sexpr> . <sexpr>%3)%*
.END

We have discussed the behaviorial characteristics of algorithms
which operate on these structures. Now we wish to examine
the storage structure aspects of these data structures.

Corresponding to these three
data structures are three "natural" storage representations.
By "natural" we mean that even though all these structures
can be represented as LISP S-expressions, for example, there
are representations which might better suit the operations
which we expect to perform on those structures.
Since "natural" is not a well-defined term, we will 
clarify its meaning using examples of    context free data structures.

The first type of data structure given above, maps naturally onto
a representation which contains information that the object is of
type %eD%1 and contains space for the storage instance of this data type.
Elements of type %eD%1 are homogeneous, being all of type %eD%41%1;
however the size of a type %eD%1 element is indefinite.
Depending on the operations which are to be performed on the representation,
either a list-representation or  an array representation is reasonable
for the storage structure. Unless the operations are quite
complex, a sequential allocation scheme suffices.


The second type of data structure  is frequently represented as a pointer.
There really isn't any storage allocated for objects of this type.
Instances  which satisfy this equation have their storage
requirements set by one of the %eD%4i%1 alternatives.
We will discuss pointer manipulation in LISP in the next section.

This section will discuss the third abstract data structure.
The essential characteristic here is that instances of this structure
have a fixed number of components, and the types of 
those components need not be homogeneous. 
Those components are typically referenced by name.
These characteristics form a natural distinction between
this third class and  the first class, even though an appropriate encoding
would make it possible to represent either class in the other.

.GROUP;
For example, in equations like:

.BEGIN CENTER;
<sexpr> ::= %3(%1<sexpr> . <sexpr>%3)%1  or

<form> ::= <function>[<arg-list>],
.END

we reference components by  selectors like %3car%1, %3cdr%1, %3func%1
and %3arglist%1. 
.APART;
LISP  represents instances of the above equations
as object of the first  and second types of data structure: 
variable-length lists of pointers.
As a result, we have thought of these selectors as operations
 which might require some non-trivial amount of computation
to discover the desired component, but  as we saw in
{yonss(P290)} what is algorithm and what is data depends on your point of
view. For example, we  could  think of a dotted pair as an array which has
two compoments, one referenced by %3car%1, one referenced by %3cdr%1.
We say "array", since the number of components is know; but the
element references are done by non-numerical names.

The natural storage requirements for such objects imply a fixed amount
of storage. That storage can be sequentially allocated since the 
size of the  element will  not vary. The representation must also
encode the scheme for associating external selector with internal 
representation.  For example:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

     	⊂αααααπααααα⊃
        ~ CAR ~     ~
	εαααααβαααααλ
        ~ CDR ~     ~
       	%ααααα∀ααααα$  		  
.END

Notice that the  array  referencing mechanisms have  to solve a similar
problem. However, array representation is such that the dope vector
can perform a %2calculation%1 to locate the element.

The storage element which we have been developing is called
a %2⊗→record↔←%1 (⊗↑[Pop#68]↑), or a %2structure%1 (⊗↑[Alg#75]↑,
⊗↑[EL1#74]↑), or a plex (⊗↑[Han#69]↑)⊗↓A similar device, called a 
hunk,  has been implemented
in MacLISP ⊗↑[Ste#pc]↑.←.

.BEGIN TURN OFF "←";
Besides the usual constructors, selectors and recognizers, records
are typically supplied with a function 
to modify components of a structure. This function is called  an 
%2updater%1.
Just as we can write %3A[43]#←#56%1 where %3A%1 is an array, an
updater function would implement a statement like %3car[x]#←#(A#.#B)%1.
.END
Updating of simple variables is called assignment.
A discussion of "updating"  of more general
data structures requires a deeper understanding of
the implementation and storage  structures. In the  case of
LISP, it requires a discussion of 
the modification of pointers. That is  the topic of the next
section.
.SS(%3rplaca%* and %3rplacd%*,,P171:)
.BEGIN TURN ON "←";
The discussion in {yonsec(P188)} developed an implementation of the
LISP operations in terms of  the manipulation of pointers.
Those manipulations allowed the 
creation of new structure or alloed
sharing of an existing structure. None of these operations involved the
modification of an existing structure.
In this section we will discuss 
some LISP  coding
tricks which do involve modification operations.
.BEGIN GROUP;
First consider:

%3←append <= λ[[x;y][null[x] → y; %et%3 → concat[first[x];append[rest[x];y]]]]%1

.END
This function copies %3x%* onto the front of %3y%*. Note: %3y%* is not copied.
Or recall the %3subst%* function: it generates  a copy with the correct
substitutions made. LISP operation must make copies  in general, otherwise 
some very 
unsavory side effects can  happen.

Consider the expression %3append[(A B C);(D E F)]%*.  It appears that we
could get the  effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator, and then replace 
that terminator with a pointer to the list %3(D E F)%*.  Thus:


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃  ⊂αααπααα⊃  ⊂αααπαα⊃     ⊂αααπααα⊃  ⊂αααπααα⊃  ⊂αααπαα⊃
~ A ~ #αβα→~ B ~ #αβα→~ C ~≤'. . .→~ D ~ #αβα→~ E ~ #αβα→~ F ~≤'~
%ααα∀ααα$  %ααα∀ααα$  %ααα∀αα$     %ααα∀ααα$  %ααα∀ααα$  %ααα∀αα$

.END

The resulting structure does indeed look like that which we would have
obtained from %3append%1. The  operation we performed
%2modified%1 the existing structure, rather than %2copying%1 it as %3append%1
would have done. Such modifications can cause serious difficulties.
.GROUP;
Let us call the modifying version of %3append%1, %3nconc%1; and consider the
execution of the following sequence of statements:

.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;

\i ← (A,B,C)

\j ← (D,E,F)

\k ← nconc[i;j]
.END
.APART

After execution of the third statement, %3k%1 would have value %3(A#B#C#D#E#F)%1.
The difficulty is that %3i%1 would  %2also%1 have this value since we
modified the structure assigned to %3i%1.
Also, any value
which was sharing  part of the structure  of %3i%* will also  be changed.
Language features which   surreptitiously change values  are evil.   Sometimes,
however,   it is quite useful to be  evil.  
The modification procedures, such as %3nconc%1, can be defined
in  terms of  two  obscene
functions: %3rplaca%* and %3rplacd%1.


.BEGIN GROUP;
The  procedure  ⊗→%3rplaca%*↔←%3[x;y]%*  replaces the %3car%*-part of %3x%* with %3y%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";


   ∞3dest∞*	     		       ∞3env∞*
    ~				   ~
    ~	⊂ααααααα⊃		   ~	⊂ααααααα⊃
    %→  ~     #αβα⊃ 		   %→   ~     #αβαα###→
	εαααπαααλ ↓			εαααπαααλ 
          # # #	 ←$			~ x ~ # βα→α⊃
	~   ~   β→ - - →⊃		εαααβαααλ   ~
	εαααβαααλ  	  		~ y ~ #αβ→⊃ ↓
        ~   ~   ~	↓               %ααα∀ααα$ ↓ ~
          # # #	      ⊂αααπααα⊃	        ⊂ααααααα⊃ ~ ↓
       	εαααβαααλ     ~ # ~ ? ~  ⊂ - - →~   ?   ~←$ ~
        ~   ~   ~   ⊂→%α|α∀ααα$	 |      %ααααααα$   ~
       	%ααα∀ααα$   ↑	% - → - →$                  ↓
		    %←ααα←ααααα←αααα←ααααα←ααααα←ααα$

.BEGIN CENTER;SELECT 2;
Algorithm for ∞3rplaca∞1
.END
.END
.END



The second function is
⊗→%3rplacd%*↔←%3[x;y]%* meaning replace the %3cdr%*-part of %3x%* with %3y%*.
The  AMBIT/G  description of this function was given on {yon(P246)}.


.BEGIN GROUP
Now %3nconc%* can be defined as⊗↓Since we're really involved with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.←:
.BEGIN SELECT 3;TABIT2(16,19);TURN OFF"←";

nconc <= λ[[x;y] prog\[[z]
\\[null[x] → return[y]];
\\z ← x;
\a\[null[cdr[z]] → rplacd[z;y]; return [x]];
\\z ←cdr [z];
\\go[a] ]]

.END
.END

.BEGIN SELECT 3;TABIT2(23,30);TURN OFF"←";GROUP;
%1Consider:%*\prog[[x]\x ← (NOTHING CAN GO WRONG);
\\rplacd[cdddr[x];cddr[x]];
\\print[x]]

.END

We can use %3rplacd%* to generate circular lists.
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In  general,  to circularize  a non-empty
list, %3x%*, %3rplacd[last[x];x]%* suffices where:

←%3last <=λ[[x][null[cdr[x]] → x; %et%3 → last[cdr[x]]]]%1

Functions which modify list structure must be  used with  extreme caution.   
They are  not
recommended for amateur hacking.  They are introduced here simply to
show that  it  is  possible to  improve  on the  efficiency  of  pure
algorithms by resorting to these coding tricks.

.BEGIN GROUP
←%2Problems%*

%21.%1  What is the effect of evaluating %3rplacd[x;cdr[x]]%*?

%22.%1 Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?

%23.%1 It has been pointed out that %3rplaca%1 and %3rplacd%1 are
closely related to assignment statements ⊗↑[And#76]↑. 
Extend one of our evaluators to recognize expressions like:
.BEGIN CENTER;TURN OFF "←";
%3car[%1<form>%3] ← %1<form>
.END
as abbreviations for:
.BEGIN CENTER;SELECT 3;
rplaca[%1<form>; <form>%3]%1
.END
This extension of assignment is obviously not restricted to
%3rplaca%1 but could allow arbitrary forms on the left-hand
side of an assignment.
.END
.END


.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the  problem of inserting
an element into the middle of a list.  For example let %3x%* be the list,
%3(A B C)%*.  If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform:

.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]].
.END

In general, we have little choice but to recopy  the initial segment
of %3x%*, adding %3D%* into the appropriate place.  A similar technique is
obviously available to delete a specified element from the interior of a list.

Careful use of %3rplacd%* can, in some instances, replace the heavy use of %3cons%*.
%3cons%* always carries with it the threat of a call on the garbage collector.

For example, given the list %3(A B C)%* with pointers, %3x%* and %3y%*, into it
as follows,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END


we could insert the element, %3D%*, %2after%* the first element in %3y%* by:

←%3rplacd[y;cons[D;cdr[y]]]%*, giving⊗↓Notice 
that %2one%* %3cons%* is unavoidable.←:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃        ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃    ⊂→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$ ↓ 	  ↑ %ααα∀αα$
			     ↓    ~
		        ⊂αααπααα⊃ ↑
			~ D ~ #αβα$
			%ααα∀ααα$

.END

But as always, be warned that the value of %3x%* has also been changed; and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.

We can also use %3rplacd%* to delete not the %2first%*, but the next element,
in %3y%* by:

←%3rplacd[y;cddr[y]].%*

Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr, %3z%* use

←%3rplaca[y;z]%*.

Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %2after%* and delete %2after%*, rather than insert %2at%* or delete %2at%*.
If you look at a diagram you will see why:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       
~	       
↓   ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END

To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell; a similar remark applies  to insertion at a specified cell. 
A simple, perhaps inefficient scheme,  to support such  modification
would be to start a second pointer
 from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.  

If these `modification-%2at%*' functions were to be performed very frequently
then it might be worth starting %2two%* pointers down the list, one at %3x%*,
one say %3y%* at %3cdr[x]%*, as above.  Then %2testing%* could be done using %3y%*
and the %2modification%* could be done using %3x%*. When we move %3y%* to
%3cdr[y]%* we move %3x%* to %3cdr[x]%*.  If we wanted to modify
%2before%* rather than %2at%*, we could proliferate the `back pointers', but
 if this kind of generality is required a change of representation
is called for.  We might resort to the double-linking scheme introduced on 
{yon(P164)}; more complex representations are also discussed
in detail in ⊗↑[Knu#68]↑, Chapter 2.

A LISP 
implementation which stores p-lists as list structure would 
use %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists would use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.

.GROUP;
.BEGIN INDENT 0,8;

%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present then we will simply change its 
value;  
if the indicator is not present then we will add the indicator-value 
pair to the front of the p-list. 
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is 
the value to be stored.
.END

.BEGIN SELECT 3;TABIT2(17,20);GROUP; TURN OFF"←";

putprop <= λ[[n;v;i] prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]];
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]];
\\go[a] ]]
.END
Note that extended conditional expressions are used in the definition.

.APART
.GROUP;
.BEGIN INDENT 0,8;
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate, used to remove attribute-value pairs from
the property list of an atom.
We will capitalize of the LISP %3NIL%1-non#%3NIL%1 trick
for predicates and return the removed property value if one is found.
The following implementation of %3remprop%1 does that.
.END

.BEGIN SELECT 3;TABIT2(15,18);GROUP;TURN OFF"←";

remprop <= λ[[n;i] prog[[m]
\\m ← n;
\a\[eq[cadr[m];i] → rplacd[m;cdddr[m]];return[caddr[m]]];
\\m ← cddr[m];
\\[null[cdr[m]] → return[%ef%3]]
\\go[a]]]

.END
.APART
Applications of %3rplacd%* will occur inside ⊗→%3ratom%*↔← and
on {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency.
Pointer modification is also used
inside the ⊗→garbage collector↔←.  Recall the %3sweep%1 phase of the collector
on {yon(P257)}.


.P161:
Finally, one of LISP's less illustrious uses of pointer modification
is to "allow" the construction of self-modifying programs.
This technique is similar  to the machine language tricks of self-modifying code
and should be used with similar frequency.
The freedom to hang yourself should not be construed as an invitation, but
it again points out the similarities of LISP with machine language and
highlights the differences with LISP's contemporaries.

LISP's  central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could conceivable modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %2own%* structure. 

.BEGIN TABIT2(14,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;

foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]];
\a\print[x];
\\rplaca[rest[y];z←add1[z]];
\\go[a] ]]

.END
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print %33, 4, ...%*.
There really isn't much that can be said about such a program.

.CENT(Problems)
I. More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.

II. Recall the function which reverses the top-level elements of a list.
On {yon(P16)} and {yon(P23)}
we wrote it in various styles and with varying degrees of efficiency.
All of these functions used %3cons%*; however it is clear that we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%* don't use any %3prog%*-variables.

.END
.SS(Numbers,numbers)

In most implementations of LISP, numbers are stored  as very simple kinds of
atoms:  they do not  need print names,  but since  we should probably
allow fixed and floating point representation, we do  need indicators
for these properties.  Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

fixed-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααα⊃ 
%→ ~ FIXNUM ~ #αβαα→~  1  ~
   %αααααααα∀ααα$   %ααααα$

floating-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααααααααααααααααααααα⊃ 
%→ ~ FLONUM ~ #αβαα→~ <machine rep  of 1.0> ~
   %αααααααα∀ααα$   %ααααααααααααααααααααααα$

.END
In this representation, the number is stored in FWS and the 
type indicator is indicated by a minimal property list.
This representation
is a  bit expensive in space and can be expensive to manipulate with arithmetic
operators. Several tricks have been used to improve arithmetic in LISP.

Assume that the  addressing
space of the machine  is 2%818%* and that the usual  size of a LISP
core image is significantly smaller, say, N.  Then all memory address
references greater than N  are illegal (and trapped by  the monitor).
What we  will do is use  these illegal addresses to encode  some of the
smaller positive and  negative integers, mapping  zero on the  middle
address, the  positive numbers to  lower addresses and  the negatives
onto  the  higher   addresses.    Thus  these  smaller  integers, called
%2⊗→inums↔←%*,  are
represented by pointers outside of the normal LISP  addressing space.
This  trick can  considerably decrease  the storage  requirements for
jobs  heavily using small  numbers.  (Numbers are  not usually stored
uniquely).

.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 20;

	⊂αααααααααα⊃ 2∞818∞*
	~?~
	~?~
	~?~ m ∞1<∞* 0
	~?~
	~?~ m = 0
	~?~
	~?~ m ∞1>∞* 0
	~?~
	~?~
	εααααααααααλ N
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	%αααααααααα$ 0



.BEGIN CENTERIT;
∞2←Picture of INUM Space∞*
.END
.END
.APART

The INUM representation takes less space but adds an additional complexity:
now the arithmetic operators have to deal with three different kinds of numbers.

The MacLISP (⊗↑[Moo#74]↑) implementation uses a different representation for 
numbers⊗↓Much care went into the MacLISP  number
handling since it is used as the implementation language for MACSYMA 
(⊗↑[MAC#74]↑,#⊗↑[Wan#75]↑,#⊗↑[Mos#74]↑),
a large algebraic and symbolic mathematics system. MacLISP's efficient
number facilities, coupled with its  optimizing compiler,
has resulted in the production of compiled code which is more efficient
than that produced by DEC's FORTRAN compiler; ⊗↑[Fat#73]↑.←.
In that implementation, two spaces are allocated for number storage:
FIXNUM space and FLONUM space. This  makes a more compact representation since
the type information is implied in the address of the object, rather than explicitly
stored. To those basic spaces we add two temporary stack areas: FIXPDL and
FLOPDL. These areas are used for temporary arithmetic computation.

The temporary areas work in conjunction with a  type declaration option
available in MacLISP. 
If we know that certain variables are %2always%1
going to be used as numbers in a particular  function then  we can compile
better code. Assume %3x%1 and %3y%1 are to be used %2only%1 as FIXNUMs
within a function %3f%1; we would make such declarations for the LISP compiler
just as we can declare some variables as "special" to LISP compilers.
When we allocate space for %3x%1 and %3y%1, we allocate space on the
top of FIXPDL. Within %3f%1 the arithmetic operations use the hardware
arithmetic and reference the stack elements. The stack elements can be
passed to other arithmetic functions called within %3f%1 and no permanent
storage need be allocated in FIXNUM space until later.
The efficiency  of arithmetic operations is dependent on the existence
of special hardware instructions for such arithmetic.
However, special hardware also places limits on the arithmetic
capabilities of most languages:  arithmetic
is limited by the word size. LISP is able to overcome such limitations.

Most  numerically oriented  programs  are  faced  at some  time  with
overflow.  That is, they attempt  to construct a number  which is too
large to  be represented  in one  machine location.    There are  now
several   versions   of   LISP  which   will   automatically   change
representation  when faced  with overflow.   
This scheme is called arbitrary precision arithmetic and
has been implemented for both fixed point and floating point
numbers. We will describe a representation for fixed point numbers
called Bignums; they could have the following structure:


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

positive big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
		      ↓                ↓
		     N∞40∞*                N∞4n∞*

negative big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
    	              ↓		       ↓
		     N∞40∞*                N∞4n∞*


.BEGIN TURN ON "←";
←∞2Structure of a BIGNUM∞*
.END
.END


.BEGIN GROUP;
The value of Bignum is given by:
.BEGIN CENTER
%7b%1(N%40%1 + %7a%1N%41%*+ ... + %7a%8n%1N%4n%1)
.END

where %7b%1 is either + or  - and 
%7a%1-1 is  the largest number  representable in one  machine word.
The translations  Bignums and the other numeric representations
is done automatically within  the evaluator.

On most implementations on LISP, no attempt is made to store numbers uniquely.
Thus %3eq%1 will not work on numbers  other than INUMs; either 
%3equal%1 is extended for numbers
or a special equality predicate for numbers is  provided.
.END
.SS(Stacks and Threading)
Though  recursive descriptions of algorithms are usually more illuminating than
the typical machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of contemporary
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.

Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using  an explicit stack:
.BEGIN TABIT3(6,12,27);SELECT 3;GROUP;TURN OFF "←";

mark <= λ[[tr]prog[[st]
\loop\[is_marked[tr] → go[chk_st];
\\ is_full_wd[tr]→ markA[tr];go[chk_st];
\\ is_free_wd[tr]→\st←push[cdr[tr];st];
\\\markA[tr]; 
\\\tr←car[tr];go[loop]];
\\ %Et%* → go[chk_st]]; ⊗↓%1This  branch of the conditional could be omitted
and the  effect would be the same.←%3
\chk_st\[null[st] → return[%et%*]];
\\tr←top[st];
\\st←pop[st];
\\go[loop] ]]
.APART;
.group;

push <= λ[[i;st] concat[i;st]]
top <= λ[[st] first[st]]
pop <= λ[[st] rest[st]]
.END

Notice that we only save the %3cdr%*-node in the stack; even at that the
stack grows proportionally to the depth of the tree being traversed. 
See the problem on {yon(P162)}.
The technique of using an explicit stack sometimes is more intuitive and 
sometimes will lead to faster execution.

The second technique is more  tricky but will lead to significant pay-offs
in execution time⊗↓with a proportional loss in clarity of code←.
The technique is called %2⊗→threading↔←%*.
The basis for threading is a desire to traverse tree-structures in a more
efficient fashion than that typically available in recursion or via
stacks. Recall that on {yon(P164)} we surmised 
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
easily. However the extra link gives us no help if we wish
to descend into the substructure. It is this area to which threading addresses 
itself: descent into tree structure, and in fact, non-recursive tree traversal.
Threading  comes in various flavors; we will now discuss a few.

Look at the new %3mark%*-er above; you should notice that for a
%2fixed%* tree and a %2fixed%* order of traversal,
that any two applications of marking will
have the same pattern of behavior. The order of visitiation to each node
will obviously be the same, but the dynamic changes in the state  of the stack
will %2also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
This technique of hiding the control structure in the data structure is called
threading. 
The general application of threading is of questionable utility. It typically
requires a more complex data structure: we must be able to store  both
threads and links. The traversing algorithms also become more complex:
we obviously must be able to recognize the difference between 
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See ⊗↑[Knu#68]↑ for a complete discussion
of the techniques and tricks.

We are not about to complicate the LISP structures, but dispensing with a 
stack, be it implicit or explict, %2does%* have some merits. What we %2can%*
do in LISP is  strike a compromise. Instead of storing the threads permanently
in the structure, there are significant applications of threading
where we can %2temporarily%* store threads as we traverse trees. 
The first application is in the design of a non-recursive %3read%* program;
the second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problem)
.P162:
With a little
more testing before stacking we can significantly cut down  the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well mark them at that time and 
proceed without doing a stack operation. Only when both branches are "non-atomic"
do we need stack the %3cdr%*. Write such an algorithm.
.SS(A Non-recursive %3read%*)
.P160:
The original %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.

However, now that we understand what the run-time behavior of such a recursive
program is, we can see that %3read%* and friends are a drain on %2two%*
resouces: they use free-space to %3cons%*-up the S-expr representation of the input;
they use  the run-time stack for  handling  the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain on the free lists⊗↓We 
 probably will be drawing on the full word area for print name
storage as well as the free space area for
list structure storage.←, but we %2can%* dispense with the run-time 
stack by threading.
We can in fact do so without a proportional increase in the use of cells in
free space; indeed we need only %2one%* additional free cell, regardless
of the complexity of the input! This is truly a win for efficiency; it will be
a big loss for clarity. But again that's why this section on storage and efficiency
is where it is; we have an understanding of the purpose and intent of %3read%*;
only after the basic algorithm is well understood should we attempt to be
clever and efficient.
First we describe the basic ideas of the algorithm, then we give the algorithm.

The main idea  in the algorithm is the realization that we really can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example consider 
the input string "%3(A#(B#C)#D)%*". As we start our left-to-right scan of the input
we see "%3(%*". This immediately tells us that we need at least one %3cons%*.
We read "%3A%*"; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed", but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the  developing representation. The
"%3B%*" and "%3C%*" add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The "%3D%*" goes on that
level and the final closing parenthesis completes the input.
All this is old but the difference now is that we don't use recursion or an explicit
stack to keep track of where we are. We keep a thread in the %3cdr%*-part of 
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup. 

There are three basic states in the reader: 
.BEGIN INDENT 5,5,5;group;
%21.%*  The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.

%22.%*  The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level 
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.

%23.%*  The other main state occurs on reading a dot when in %3tail%*-state⊗↓dots 
seen in  any other context are errors←. In dot-state we check the next input;
if it's an atom we stuff it on the thread and go up. If it's a  left parenthesis
we add  a new cell and go down.
.END
There are some anomalies in the algorithm due to the desire 
to recognize both S-expr notation
and list-notation. The main malefactor is a count of parenthesis; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.

The final difference between the old parser and the new one involves
the scanner, %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. This is not too strenuous an assumption. If the scanner
sees an open parenthesis, it looks ahead to the next meaningful character⊗↓ignoring
spaces and the like←. If the character is a closing parenthesis, it takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.
With this introduction, here is the new %3read%*:

.BEGIN SELECT 3; TABIT3(6,16,27);TURN OFF "←";GROUP;

read <= λ[[]prog[[j;cp;count;top;temp]
\count←init[]; cp←count; top←cp;
head\j←ratom[];
\[or[is_dot[j];is_rpar[j]] → err[];

\ is_lpar[j] →\incr[count];
\\cp←down[cp];
\\go[head];
\ atom[j] → stuff[cp;j]; go[ckend]];
tail\j←ratom[];
\[atom[j] → cp←insert_move[cp;j]; go[ckend];

\ is_rpar[j] →\decr[count];
\\[eq[top;cp] → go[ck1];
\\ %et%* →  cp←stuff_up[cp;NIL]; go[ckend]];

\is_lpar[j] →\incr[count];
\\cp←down[insert_move[cp;NIL]];
\\go[head];

\is_dot[j] →\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[];
\\ is_lpar[j] →\incr[count];
\\\cp←insert_move[cp;NIL];
\\\go[head];

\\ atom[j] →\cp←stuff_up[cp;j];
\\\go[ckend]]]; ⊗↓%1This %3go%1 is superfluous, but makes the flow more apparent.←%3
ckend\[eq[cp;top] → go[ck1];
\ %et%* → go[tail]];
ck1\temp← cnt[top];
end2\[zerop[temp] → return[exp[top]];
\j←ratom[];
\[is_rpar[j] → temp←sub1[temp]; go[end2];
\ %et%* → err[] ]]]

.END
.BEGIN SELECT 3;GROUP;TURN OFF "←";TABIT1(22);
init <= λ[[] cons[NIL;0]]
stuff <= λ[[x;y] rplaca[x;y]]
incr <= λ[[z] rplacd[z;add1[cdr[z]]]]

insert_move <= λ[[cp;val] rplacd[cp;cons[val;cdr[cp]]]; cdr[cp]]

down <= λ[[cp] rplaca[cp;cons[NIL;cp]];car[cp]]

stuff_up <= λ[[cp;j] prog[[temp]
\temp ← cdr[cp];
\rplacd[cp;j];
\return[temp]]]

cnt <= λ[[x] cdr[x]]    
exp <= λ[[x] car[x]]

.END
The development and understanding of this algorithm required most of what we
have covered in the course. 
We used our knowledge of the parser, %3read%*; we used our familiarity with
 S-exprs stored as linked-lists; we have to understand the run-time control
of recursive calling sequences; we had to understand pointer manipulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were  able to apply threading at a level higher than a "once-only" trick.
.CENT(Problem)
%21.%* Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.


.NEXT PAGE;
.SS(More Applications of Threading)
.Cent(A link-bending garbage collector for LISP)
.P144:
The use of a stack is one of the difficulties associated with garbage 
collection. Garbage collection is invoked when available space has become
exhausted, but here we are, asking for %2more%* space to use for stacking.
The usual solution is to allocate a separate area for stack storage. This 
has its drawbacks. If we don't allocate enough stack space, i.e., if the
depth of a piece of structure becomes too great, then the marker will fail.
The amount of stack space can become large; proportional to the
depth of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Several versions of such threaded collectors are available; see
⊗↑[Chr#68]↑ for a version
written in AMBIT/G; a more traditional description is found in
 ⊗↑[Sch#67]↑⊗↓The correctness of [Sch#67] has been proved by  de#Roever.←;
 and see ⊗↑[Knu#68]↑ for several alternatives.

.Cent(Binding Implementations)
Threading can be used in the shallow binder of {yonss(P228)} to  remember
the path through the environment tree#(⊗↑[Urm#76]↑). 
We thread from E%4bind%1 to E%4inter%1
when we are looking for E%4inter%1. 
This consists of reversing the access links as we proceed towards E%4inter%1.
Then, as we swap back the value cells, we will unthread from
E%4inter%1 to E%4bind%1.
.SS(Storage Management and LISP,,P224:)

There are two basic areas of LISP which require attention:
the implementation of data stuctures, and the implementation of the
LISP machine. We will discuss applications in that order.

LISP's most general data object is a dotted pair; however we frequently
are found to be using dotted pairs to represent more structured objects.
For example, many common LISP programs involve list-operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list.  We would like to capitalize on this more 
efficient representation without jeopardizing the LISP operations.

An analysis of the LISP operations shows that we have to be able to %2share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we have to be able to
%2modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
Consider the typical  representation of the sequence:
.BEGIN CENTER;SELECT 3;

(LAMBDA (X) (F X Y))
.END

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααπααα⊃   ⊂αααπααα⊃  ⊂αααπαα⊃  
~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
%αααααααα∀ααα$   %αβα∀ααα$  %αβα∀αα$ 
		   ~	      ↓
	⊂αααααααααα$     ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
	↓		 ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
	⊂αααπαα⊃	 %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
	~ X ~≤'~  
	%ααα∀αα$

.END

This takes seven words. The %3car%*-part  of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store. 
Using some extra encoding we can carry the same information in 
seven⊗↓slightly larger← cells.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ     ⊂αααααααα⊃
~↓      #αβαααα→~/     X ~
εαααααααααλ     %αααααααα$
~/      #αβα⊃
%ααααααααα$ ↓   ⊂αααααααα⊃
	    %αα→~↓     F ~
		εααααααααλ
		~↓     X ~
		εααααααααλ
		~/     Y ~
		%αααααααα$

.END

The intent of the special characters  is to encode information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
The %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is 
%3NIL%*.

The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell is a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

⊂ααααα⊃			⊂ααααα⊃
~↓  A ~			~→  A ~
εαααααλ			εαααααλ    ⊂ααααα⊃
~/  B ~			~*  #αβααα→~/  B ~
%ααααα$			%ααααα$    %ααααα$


⊂ααααα⊃			⊂αααααπααααα⊃    ⊂αααααπααααα⊃
~→  A ~			~  A  ~   #αβααα→~  B  ~  ≤' ~
εαααααλ   ⊂ααααα⊃	%ααααα∀ααααα$    %ααααα∀ααααα$
~*  #αβαα→~→  B ~
%ααααα$   εαααααλ
	  ~* NIL~
	  %ααααα$

.END
However this encoding scheme is not sufficient as it stands. Consider the 
following example: Given internal pointers %3x%* and %3z%* into:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END

and assume we wish to perform %3rplacd[x;(A#B#C)]%*.
In our standard implementation we would arrive at:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ # ~  %αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀αβα$      %ααα∀ααα$   %ααα∀αα$
          ↓
          ~ ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
          %→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
	    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$
.END

But if we assume %3(F X Y)%* is represented in its compact form
we have some troubles.
We can't replace the cell:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

⊂ααααα⊃            ⊂αααααα⊃
~↓  X ~		by ~*   #αβ→ to ∞3(A B C)∞*
%ααααα$            %αααααα$
.END
since %3z%* would get justifiably upset. What we will do is use the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3x%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

   x	⊂ααααααααα⊃     ⊂αααααααα⊃   	⊂αααααααα⊃
   %→αα→~i      #αβαααα→~↓     F ~ ⊂→αα→~↓     A ~
 	εαααααααααλ     εααααααααλ ↑	εααααααααλ
   z →α→~↓      X ~     ~*     #αβα$	~↓     B ~
	εαααααααααλ     %αααααααα$	εααααααααλ
	~/      Y ~  			~/     C ~
	%ααααααααα$ 			%αααααααα$

.END


These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
proposed by R. Greenblatt in his LISP machine; he has also proposed
implementing %3cdr%*-coding in hardware.
Between invisible pointers and 
%3cdr%*-coding, we %2can%* effect all LISP operations using this
potentially more compact representation. 

We  must be able to maintain that compact  representation while the program
is running. That requires more care in the  management of storage.
We cannot simply garbage collect and fragment space; we cannot 
use the simple compacting  garbage collector of {yonss(P291)} since it does not
attempt to  maintain the compact representation. 
Several algorithms
with the desired properties exist (⊗↑[Che#70]↑, ⊗↑[Cla#76]↑).
One feature of this data representation is its  use of variable-sized
linked blocks of sequential storage. The management of these storage
blocks is more complex than that of simple dotted-pairs, but the
additional overhead may be  acceptable if it gives better locality of reference
and faster access to  list elements⊗↓Notice that the %3cdr%1-coded representation
of %3(A#B)%1 and %3(A#.#B)%1  are equally expensive. In the typical linked-list
representation, %3(A#B)%1  requires more space than %3(A#.#B)%1.←.

There is less  conflict about the use of  more complex storage
management techniques in the area of LISP's dynamic implementation.
The original versions of LISP#1.5 used dotted pair structure to represent
the access environments⊗↓The control information did use a stack implementation
coded in machine language.←. This generality allowed a correct solution to the
implementation of %3function%1, but  experience with LISP implementations
has shown that it is quite expensive to  maintain this generality
when most applications are of a less general nature.
Implementation techniques, patterned after our Weizenbaum
diagrams, allow some economies without loss of generality.
Again, storage would be allocated in sequential blocks; each block would
be of size sufficient to hold the representation of the name-value entries
along with the additional areas to link the block to the environment.
The storage blocks need not be allocated sequentially; indeed, in the
general case blocks cannot be allocated sequentially. The de-allocation
problems are somewhat different from those experienced 
by data structure representations.
The environment structures are much more "well-behaved" than general list-strucuture.
Therefore an "environment garbage collector" may not be needed.

The most general techniques for management of LISP's dynamic
environment are based on ⊗↑[Bob#73a]↑ and succeeding papers⊗↓There is something
contradictory about LISP implementor's  attitudes toward  storage
and  dynamics. Much effort  is  expended in attempting to
minimize the overhead involved in  the dynamic operation of LISP;
it is frequently stated that the user should not be penalized
for access/control constructs which he does not use. However, that
attitude is not extended to LISP's data structures. There are very
generous subsets of LISP applications in which the data structure operations
are suitably well-behaved, that storage reclamation techniques
less general than garbage collection are applicable. Analysis of this
area of LISP should lead to  profitable  results.←.


At a lower level of implementation, LISP has much to say about machine
organization. The implementation of efficient environment swapping
algorithms is a problem which  any operating system must  face.
The traditional solutions impose   severe restrictions on  inter-process
communications. The  algorithms developed for LISP show promise
for giving efficient implementations of more general scope.

LISP's orgainzation of memory also has lessons  for machine architecture. The
management of large variable sized memory spaces like ⊗↑[Ste#73]↑ or ⊗↑[Wegb#70]↑
can be supported in hardware. The  allocation and de-allocation of such large
spaces also require care; LISP implementors have begun to address these problems
(⊗↑[Ste#76a]↑, ⊗↑[Bis#74a]↑).

Finally, the highly interactive nature of modern LISP programming
systems gives direction to efforts  developing  more "reactive"
programming systems (⊗↑[Win#75]↑).

.SS(Hash Techniques,,P248:)
One very significant problem experienced by a LISP programmer is the 
sheer size of data structures which a large LISP program generates.
Many LISP projects  approach 10%87%1 bits of program and data.
Several techniques have been  developed to help shrink  data representation;
%3cdr%1-coding ({yonss(P224)}) is one technique. Another technique stems
from the  observation that LISP tends to %2copy%1 structures rather
than %2share%1 them. We know that  the sharing of structures  must be done
with great care if modification operations like %3rplaca%1 and %3rplacd%1
are present, but  sharing of structure can mean a significant saving in space.
In fact, the saving can also be  reflected in the algorithms which manipulate
the structures. For example  if every list-structure is stored uniquely,
then the time for the equality test %3equal%1 is a constant rather than being
proportional to the  depth of the structure.

There are two alternatives for maintaining unique structure:
either maintain list space such that unique representations are always 
present; or supply an algorithm which will "uniquize" structures upon
request. The first alternative is usually called %2⊗→hash consing↔←%1; the
second technique is called %2list condensation%1#(⊗↑[Lin#73]↑).
A condensation algorithm must remove all duplicated structure from within
a list. Since condensation is a component of many  hashed LISP implementations,
we will concentrate our attention on hash consing.

Hash consing is an extension of the  LISP technique for generating
unique atoms.  Since
list structure is created only by the %3cons%1 operation⊗↓List structure
may be %2modified%1 by %3rplaca%1 and %3rplacd%1, and we will
discuss that in a moment.←,
we place the responsibility for maintaining  unique structure on %3cons%1.
If the result of a pending %3cons%1 is already present, then we
return a pointer to that structure, otherwise we perform the %3cons%1
and record the result  so that it will be retrieved if the same
%3cons%1 happens again. The adjective "hash" is applied to this version of
%3cons%1 since the typical implementation uses a hashing algorithm to
maintain the uniqueness.
Hash %3cons%1ing imposes restrictions on the
programmer's use of list modification operations. If unique copies
are available severe difficulties result if modifications are made.
One may either  disallow list modification, or may supply additional operations
to copy structure, modify it, and "uniquize" the result; or  an implementation may
supply different kinds of structures, some modifiable, and some not modifiable.

A hash %3cons%1 was proposed for LISP 1.75 on the IBM#M44, but the
implementation was never
completed. A limited version of hash %3cons%1ing was implemented as
an extension of LISP 1.6 at Stanford.

The most impressive and extensive applications of hashing appear
in HLISP (⊗↑[Got#74]↑, ⊗↑[Ter#75]↑). That implementation of LISP
supplies two different kinds of structures: ordinary list structure
and "monocopy" structures. Operations are also supplied for
conversion between types. Extensive analysis of hashing effects
on algorithm performance has also been done (⊗↑[Got#76]↑).
HLISP also employs hashing in its implementation of property lists.
Property lists are not stored explicitly, but rather the atom name and the 
property name are used to form a hash index; the property value is associated
with that hash index.
For example,
%3get[x;i]%1 hashes  with both  %3x%1 and %3i%1 to  retrieve  the property value.

The other major implementations of LISP also
offer specialized operations for dealing with hashed  quantities;
see ⊗↑[Moo#74]↑, ⊗↑[Int#75]↑, and ⊗↑[Bob#75]↑.

.REQUIRE "NOTES.IMP" SOURCE_FILE;